Introducing Plaid

lgessler · July 24, 2025, 5:51pm

I’m excited to share something I’ve been working on all summer. If you know me, you know it’s been a goal of mine to make AI model use practically viable in language documentation work, and I’ve made something which I think can remove some of the historically intractable obstacles to this.

I call this app Plaid, and these are the most important goals:

Provide core functionality present in popular apps for LD like Elan and FLEx
Make it as easy as possible to plug an ASR/NLP model in to any part of the workflow
Have multi-user collaborative editing as a default
Keep all past database states so that data is never lost (edit: I totally forgot to show you this in the demo vid! I’ll show that off during the tinker on Aug 1).

While there’s still a lot of work left to be done, I think what I’ve made is just about ready for, say, a field methods course. Rather than writing too much more, here’s a demo ahead of the more elaborate one we’ll have in the August 2025 Tool Tinker:

I’ll mention that one thing I’ll be able to talk about more during the Tool Tinker is making new model integrations. I won’t call it “easy”, but I hope you’ll agree this is probably much less work than anything that’s come before, especially because all you need to do is write a Python file (no backend or UI changes) if all you want to do is e.g. plug in a new ASR model or tokenizer or morphological parser. See the existing implementations: the ASR model for Whisper, for example, is just 200 lines of code.

I’d be grateful for any feedback you have! I’m also planning to set up an instance of Plaid for demo use sometime soon, so please write in this thread if you’d like access to it.

lgessler · August 1, 2025, 11:37pm

Thanks all for the helpful comments at the tech tinker! I just wanted to summarize some of the discussion here.

First, most immediate next steps for me on development are:

Morpheme-level annotation
.flextext import/export
Annotation of AV segmentations with speaker ID and other arbitrary data
Some kind of built-in morphological parser
.eaf import/export

Some ideas that were mentioned (I fear I might have missed many of them, so please tell me what I missed):

Some improvements over the FLEx parser, e.g. the case Amalia described where two morphemes that have been seen individually in other contexts don’t get parsed
Undo button (“restore historical version”, perhaps)
Notes at the annotation level
Citations–I assume this means being able to hyperlink to individual morphemes, say

Please do let me know your other thoughts here if you have anything to add later on.