Many of us here are interested in developing new tech for supporting fieldwork, and to this end, I thought it’d be nice to start an open-ended discussion about what an ideal fieldwork app would do. Many people use ELAN and FLEx for doing their documentation work, and we must respect the developers of these apps immensely for their groundbreaking and long-lived work in this space, but I know many people whose needs are not entirely served by either app.
So let’s pose the question: how do ELAN, FLEx, and other tools serve your needs at the moment, and in your ideal world, what would your app do for you? While I haven’t conducted serious fieldwork myself, I’ll get this thread started by sharing some thoughts I’ve heard from fieldworkers I know.
ELAN: nice because it has very configurable tiers and time-alignable annotations, but lacks support for “project-wide” data such as orthographies and morphemes/lemmas.
FLEx: has nice support for “project-wide” data like morphemes and lemmas, enabling bulk edits, but morpheme segmentation doesn’t always work as expected depending on orthographies (esp. for tonal languages), syncing data with collaborators doesn’t always work straightforwardly, and it can be difficult to install.
As far as a new app goes, here are some ideas that have been in my head:
- “Free” dictionary: If an annotation app existed that were implemented as a web application, it would use a database (like SQLite), and since dictionary websites/apps, are primarily read-only, they could be implemented on top of the same database the web application is using, giving anyone using the app a low-maintenance way to also provide their communities with a dictionary
- ML integration: there’s been a lot of work recently in the computational community on automating tasks fieldworkers have to do, such as audio transcription and interlinearization. It’s generally hard to integrate these algorithms into existing apps, but a good one would allow machine learning models to be integrated into the app so that e.g. when an audio file is uploaded it is automatically transcribed, or when a word is entered it is automatically interlinearized.
- Import/export API: there are some formats that are quite common, such as EAF or FLEx XML, but people will inevitably have to import or export their data using less common formats, and the best solution for a case like this is to provide a simple Python or JavaScript API for people to use to import or export their data, assuming a minimum of programming knowledge.