I was chatting with some people in my department the other day and we all agreed we didn’t know where to look for a curated list of apps for annotating linguistic data. Every now and then someone needs to start working with a new kind of data (e.g. constituency trees), and usually how that goes is they ask around, and maybe one or two people point out an app or two.
We looked around and couldn’t seem to find anything like this, so I went ahead and made something to try to fill this gap: the Map of Applications for Linguistic Annotation. You’ll notice that we’re trying to keep track of some important metadata that often are central to which app people end up choosing to work with (e.g. supported import/export formats and supported annotation types).
The list’s very small right now but I thought I’d share it here in case people are interested! And if you know of any other apps or have suggestions for how to improve the list, please open an issue.
Seems like RNLD (now Living Languages) had something like this on their old website. I thought there might be some to add from there tho I don’t think it had been updated very often. It seems to be gone now. So this is great, Luke!
That’d be very helpful I think—there are plenty of people who know about apps who haven’t done it yet because they don’t exactly know how to do an issue/PR, don’t know the data format, etc. Some kind of form that makes it as approachable as possible would be super helpful I think
It could live on the site and then um, generate a JSON to be submitted by email or issue perhaps
I’ve wanted to include only properties that we can easily get for listed apps and future apps (in order to reduce the burden of maintenance and extension) though maybe it’d be fine to have some “optional” properties.
Maybe this is a field in we can collaborate. I’ve been working on two things:
a way to index software by language it relates to see my previous publication with Richard Littauer.
I’ve been working on setting up a literature review on Interlinear Glossed Text and annotation schemes. I have a Bibliography of several dozen resources so far.
This is perfect for a couple projects I’m involved with! I’ll start them on MALA.
One of our projects is looking for an easy tool for aligning or listening to audio while annotating. ELAN just wasn’t working. So also glad to hear people’s experiences!
One project wants one tool to correct, annotate, and leave comments on AAVE features in transcription and also align text to audio. The other project is basically to discover if there is already an interface that would work well for integrating human-in-the-loop ML for interlinearization of texts (morphology, POS, grammatical relations, semantic roles, etc.). It would be something that is designed for or usable for NLP and has an intuitive interface for linguists. And is interoperable with FLEx and/or ELAN.
Maybe I can crowdsource some help here. I’m drafting some questions to guide the RA who will be exploring software for both projects at once:
Cost?
Open-source?
Designed for text only? Audio only? Can you align text and audio, or just playback audio?
Designed for active learning?
Import and export formats?
Collaborative?
Cited uses
Misc:
– Log-in, account required? Admin?
– What can be selected for annotation: character? subword? word? phrase? sentence?
– How many tiers of annotation?
– Stated purpose of design? Keywords?
– Ease of use? e.g. customer reviews
Interesting—I could see this being added as an optional metadatum for MALA if MALA seems like a good fit for it to you.
That sounds super interesting. When I was setting MALA up I also thought it’d be super helpful to have a similar list for annotation schemes instead of apps. I wonder if there’d be wide interest.