📄 [Paper] LingView: A Web Interface for Viewing FLEx and ELAN Files

pathall · July 1, 2020, 12:46am

Thanks everyone for contributing here. I wrote some somewhat-related general thoughts over here.

I have been thinking so much about all the recent threads here that I have kind of frozen up, too much to say! I’m going to just go ahead and break that seal and try to pop in some small-scale thoughts…

Also, I really hope we can get lots of separate topics! I would hate to see a great idea for an app (for instance) fade away without discussion just because it’s deep in another interesting discussion. I hope everyone feels free to start topics! Site tip: there is a “reply as linked topic” option that you can get to by clicking the arrow in the top left of the editor. Then you can peel something off into a standalone-thread while still (automatically) leaving a pointer at the first reference. I’ll be trying that a lot more myself, let’s see if it’s useful!

There are so many ideas here, all of them are worthy of more discussion, I’m just going to make a little laundry list here:

@lgessler’s example of a React application
Whether web components are a good path
@rgriscom’s mention of the OSF — after looking a bit more, I think this definitely warrants a topic of its own, and could be a reasonable archival target for a situation like @inigmendoza’s.
Data citation has come up repeatedly.
The Austin Principles — I wasn’t aware of these, myself.
The “who’s technical?” question — again, something worth exploring (and expanding!) as a group.

I would like to foreground the question of data, because I think data questions inform application design and implementation questions.

We need to standardize data formats that, at the very least, can handle granular and resolveable references to:

Time-aligned interlinear texts — Texts containing something like “sentences” (or “lines” or “utterances” or wahtever you want to call them) with morphologically analyzed and glossed words (potentially with their own timestamps), as well as optional additional labels such as language, speaker, etc
Grammatical categories — This one doesn’t get enough attention, I think. You and I “just know” that the abbreviation NOM can mean not only “nominative”, but also nominative case. In other words, a lot of the terms with which we are so familiar come in “category/value” pairs. Actually encoding such facts in one place in the documentary database is really important, and enables all kinds of cool interactions with our documentation.
Lexical materials — The words. Anything that we don’t want to repeat for every token should be recoverable from a lexical entry. And those entries can be as baroque as the linguists involved require, as long as there is an agreed-upon way to identify the word that each entry requires. I think this can be done without opaque identifiers (“Universally unique identifiers” or UUIDs, to use the lingo), instead using forms and glosses as a sort of “compound” identifier. I would love to talk more about this kind of thing here.

I would go so far as to suggest that the data format is more important than any one particular application, no matter how powerful that application may be. After all, if we have a fairly standardized data format, then we can imagine lots of applications pipelined together for various and sundry reasons, passing data in the standard format along like a hot potato.

The design of such a format does not need to be an enormous undertaking. The very fact that everyone agrees that Flex and ELAN are in a dysfunctional relationship is itself evidence that our community already has an idea of what data they want and need to be able to manipulate.

My personal preferences for a way to store such data is to use JSON, not XML, since it’s so easy to read and understand for humans, and so trivial to parse in pretty much every programming language. It would is also reasonable to write importers to and exporters from JSON to existing formats such as .eaf or flextext or whatever, where useful.

So… this still feels like a rather rambling post but I’m hoping if we all ramble together we can start to narrow in on actionable collaborations!