I wonder if we need a new label.
Among the many interesting observations in recent posts here, there have been mentions of whether and how big-data style approaches could be applied in language documentation: those mentions include machine learning, NLP, computational linguistics, and so forth.
I’m all for the exploration and application of techniques of that ilk. But I also find myself thinking about how essentially all of those approaches to working with language really can’t get out of the gate without a sizeable corpus. It would be awesome if there were part-of-speech taggers and speech-to-text and text-to-speech and automated glossing and all that stuff for every language on the planet. But the fact of the matter is that for the vast majority of languages — and certainly for the vast majority of languages that documentary linguists and their colleagues work on — there just isn’t enough data to bootstrap such systems, yet.
Maybe I’m misunderstanding what it takes to get going with numerically-oriented techniques. I know that certain NLP tasks (for instance) can get to high accuracy with a fairly small corpus. (Part-of-speech tagging is often put forth as the symbol of tasks for which high-accuracy is fairly easily attainable.)
But isn’t it an unavoidable fact that we are not beyond simple data entry? Put bluntly, we’re going to have to type. A lot. Even if we have help with glossing — and even if Zipf’s law is on our side, reasonable documentation of a language involves typing thousands and thousands of unique sequences of characters.
To my mind, this boils down to the fact that we face, primarily, a user interface problem. How do we make that difficult tasks somewhat less difficult? Well, by thinking in terms of user interfaces. By thinking — a lot — about design. About how linguists actually get their work done. About studying data lifecycles.
But this kind of stuff is traditionally not really in the linguist’s wheelhouse. Ling 101 does not help you learn to decide whether a dropdown or a checkbox is going to work better in your documentary task. And yet, I think, we do need to learn about that. We need to talk and share about how to convert what we do into designing tools to help us do it.
So what is that process called?
“Documentation design”?
“Digital documentation”?
<your suggestion here…>