An old post about structuring interlinear glosses in HTML from… gulp… eight years ago

pathall · September 21, 2020, 1:45pm

Am geezer.

In September 2011, someone asked this question on linguistics.stackexchange.com:

I’m trying to use interlinear glossing to show the structure of a sentence to an audience without requiring them to learn the language in question.

Are there any tools for quickly creating an interlinear gloss and getting the corresponding HTML (or other markup snippet)?

Ideally, I wouldn’t have to write raw HTML to achieve this, but one does what one must do.

The thread never got a crazy amount of attention, but after all these years my post is still there:

I still think that analysis is more or less okay, except that it’s only addressing the markup (HTML/CSS) side of things. Markup is important, and the approach described in that post is still more or less okay, but that’s the smaller problem. The problem isn’t just “how do I create interlinear text as HTML?” — it is, rather, “How do we (not “I”!) model documentary data in a simple, extensible, reusable way, and how do we build applications that allow us to use that model productively?”

It seems to me that in fact the first part of that question is a much simpler problem than the second. It really isn’t a big deal to take well-structured data and stamp out decent HTML from it. And many projects and papers have done that for a long time. (Even in that post I link to a text in Gothic which is still there.) I could dig some references out of my bibliography if anyone is interested.

If we don’t have software to do stuff with the data model, no matter how lovely that data structure is, it just doesn’t matter. And the fact of the matter is that our software has been stagnating for ages. I started grad school in 2011 (GET OFF MY LAWN), and in terms of the day-to-day tools that working documentary linguists use, essentially nothing has changed. That’s not to say that there haven’t been amazing developments in server based tools — all the Max Planck stuff in particular is amazing (and WALS has come a long way since this 2009 version). But that’s an “end-product” thing. It doesn’t help anyone to do the documentation in the first place. The same is true of archives — they’re great, we love them, but they are not documentation tools, they’re archival tools.

I have found in my “career” that one can only really hope to get a few things heard. I mean, if you have instutitional support and a chair and a gang of grad students or whatever, maybe you can get several things heard. But… yeah. Everybody else, you gotta just keep yelling upwind. Nine years later, my throat is dry from yelling this one thing: if we get the data structure of documentary data right, and if we come up with ways to edit and use that data structure, then we can do a lotta stuff.

When I wrote that post, quite frankly, I had no idea how to implement a system for editing structured data on the web. (For instance, I mentioned a Javascript library called https://backbonejs.org/ (for which I still have an inordinate fondess, but which pretty much no one uses anymore.) And here we are still facing the same problem — and we don’t really have an easy answer for Martin’s question.

But we can get there. The web platform has improved considerably, even since 2009. There are more “angles” that we can take to try to onboard people into becoming active participants in a software ecology for documentary linguistics. We need to think of this problem as our problem. We need to start imagining solutions, all of us. I happen to have some ideas to throw into the ring, but if something else better comes along I will be thrilled with that too. The important thing is, we could be using the web better, and it’s going to take a village to figure out how to do that.

GO TEAM

Also, it appears that my original post was made at 3:46am. Ah, the good old days.

lgessler · October 10, 2020, 5:00pm

If the problem is to have an ergonomic way of writing read-only IGT in HTML then this seems like a prime candidate for a web component. I think the trickiest question is the matter of what the most ergonomic interface to the web component is. In addition to the plain HTML approaches you give in your answer there, Pat, I could imagine a whitespace-formatted mode where the IGT is formatted such that every morphological segment is aligned vertically with its information in other tiers and the web component attempts to parse it. That reduces the verbosity of the XML, but might be hard to use entirely correctly… curious to hear anyone else’s thoughts! This project is small enough I bet it could be banged out in a weekend.