Continuing the discussion from Post ComputEL-5 Discussion:
Hi @SarahRMoeller! Sorry we missed you.
Sorry for the confusion; I should really put some work into setting up banners or so some such on this site with upcoming stuff; there is an events plugin which is supposed to pick up timestamped events in posts and add them to a calendar buuuut…
bugs.
So I was trying to take notes in the background in a post as we talked but it was hard to keep it up as we had a pretty wide-ranging talk. Rather than kvetch I’ll just dump it in another topic here as-is, with the understanding that any of @cbowern , @lgessler , or @fauxneticien might want to edit it. I can say right now that some of these notes will be hard to interpret, but at least you can get a vague feeling for what we were talking about. I did make this a wiki post so you can edit this text directly if you like.
There did seem to be interest in a recurring meeting of this sort, or perhaps even an LSA panel.
Why are people willing to go through complicated workflows in order to use FLex? (@cbowern)
- coding is scary
- a GUI is more comfortable
- SIL’s influence on linguistic practices — Toolbox, FLEx, bible material content, even what is available for NLP and machine learning are done on what comes out of SIL work.
- Is this a historical development simply because missionaries were interested in linguistics for the purposes of conversion, or were linguists
- Standardization is useful, and what comes out of FLEx is semi-standard
- Toolbox was the only tool that was popular for so long
- The environment
- If you’re an SIL affiliate you kind of have to use FLEx stuff
- the right way to handle this is to decouple morphology from the core system
- not possible to create a default morphological parser
- the FLEX parser is limited
- vowel harmony, underspecification, doesn’t work well with tone languages, allotony
@claire
- re decoupling - ELAN, for fieldwork data is great
- but legacy data is another story: different data types
- what do we do with all the old stuff?
- toolbox was nice with Toolbox - you could do transcription in ELAN, and then parse in Toolbox
@claire
- digital humanities tools are great if you want to
- if you want to incorporate parsers and so forth
- Ticha - workflows?
- Bardi - normalization of legacy materials was a big part of the project. Originals were all texts
The One App to Rule them All (@fauxneticien)
- the One App to Rule them All idea: we should think in terms of dataflows as opposed to app design - input output problems
- even if different apps have the same “functional” content, if they can’t do I/O, there are roadblocks
a constituency tree: 1 text, 1 token, a span layer, relation layer, interdependencies between layers
user building blocks to build any kind of representation, and also multiple representations (say, consituency and named entities)
- text -
- tokens (maybe many)
- spans
- relations - source, target, value
could be common configurations
tradeoffs:
- in exporting a structure, you don’t have a meaningful representation