What do you wish you'd been taught about language documentation?

Hello all! Since I’m in the process of putting together an online langdoc class from scratch, I’d love to expand beyond my own experience and ask you all:

What do you wish you’d been taught about language documentation?

Relatedly, what were you taught that has been most helpful in allowing you to carry out language documentation projects successfully?

1 Like

That tiers aren’t what people think tiers are.

Understanding that fact explains 90% of the incompatibility between Flex and ELAN, for a start.

Hmmm… could you enlighten us or link to where you may have already explained this?

1 Like

I think on a general level, the most useful things I was taught weren’t very technical. The most important was probably that each project and person is different and one shouldn’t be afraid to do things differently from others if it works for your situation.
I think multi-tier ELAN searches would be something useful to be taught early on, rather than figuring out yourself later. Really helps finding constructions and materials for elicitation.
That’s all I can think of right now, but obviously there is so much more… but again, depends on the goals.

2 Likes

Sorry about that comment, it was well-nigh useless. Long week.

Anyway, for one thing I totally agree with @Sandra that the most important things about fieldwork are not technical, they’re interpersonal. I guess in that regard it’s kind of like saying “I wish I had known so-and-so better when we started working together” but of course the only recipe for getting to know someone is time.

As for the tier business, firstly I cannot enlighten anyone — this morning I put my underwear on the wrong way and walked backwards all day. Worth bearing in mind.

I guess what I was trying to say is this: ideally, a corpus of parallel text (transcriptions and translations) and a lexicon (a list of unique form/gloss pairs) grow together. In principle, one shouldn’t ever have to gloss a word once it’s been glossed before. (Especially if it’s a giant, complex word!)

The reason I was harping on tiers is that the way they get talked about often sort of smears together data types that aren’t really comparable: if you look at something like IJAL interlinear formatting guidelines, you see this kind of thing:

Note 6 says “indent second line of long examples”… but, the whole shebang is described as a “three-line” format. I mean, we know what they mean of course. It’s not just nitpicking, though. I would argue that the problem is that “interlinears” aren’t really a series of “tiers” or “lines”. There are “sentence-level” lines like the transcription line (when present — IJAL calls those “four-line” format) and of course the free translations. But the other two “tiers” are both at the word level, not the sentence level. That this is the case is obvious from the way that they wrap (and in IJAL style, indent upon wrapping).

If one is typing into a word processing program, there really isn’t any way to make this happen, you just have to accept the “lines-only” model. And in practice ELAN sort of enforces the same model — it’s true that you can do “child tiers” and so forth but the steps required to do that are… obscure.