Thanks, @nikopartanen, for prompting me to write this!
One of the cool things about using the web for language documentation is that you can format interlinear text right.
What does “right” mean? It means:
Words wrap correctly
Every word has its own tag
Every “tier” of every word has its own tag
Sentence-level “tiers” like the transcription and one or more translations also have their own tag
In this post I’m going to share with you what I have come to believe is the simplest way to meet these requirements with HTML
and CSS
. We’re going to look at two sentences, one very short, and one very long. The short sentence has just three words, the long one many.
So here is a simple short sentence (this is from a text in Hiligaynon), presented in a tabular format:
transcription | pahigád kamo da! | ||||||||
---|---|---|---|---|---|---|---|---|---|
translation | Get out of my way! | ||||||||
words |
|
(raw data)
{
"transcription": "pahigád kamo da!",
"translation": "Get out of my way! stay on the side",
"words": [
{
"form": "pa-higád",
"gloss": "CAUS-move"
},
{
"form": "kamo",
"gloss": "2PL.ABS"
},
{
"form": "da",
"gloss": "there"
}
]
}
I won’t melt your eyes with the same presentation for the long sentence, but if you like click here to see it.
long sentence
transcription | indí gustó si Juan nga mag-ininawáy kamó da’ magcomment kamó nga amo ní eskwelahán nyo amo ní amo ná | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
translation | Juan doesn’t like for you to fight in the comments about this school or that school | ||||||||||||||||||||||||||||||||||||||||
words |
|
Obviously these tabular representations are not ones we want, we want standard interlinear notation. We can get there with HTML
that looks like this:
<div class=sentence>
<p class=transcription>pahigád kamo da</p>
<div class=words>
<p class="word">
<span class="form">pa-higád</span>
<span class="gloss">CAUS-move</span>
</p>
<p class="word">
<span class="form">kamo</span>
<span class="gloss">2PL.ABS</span>
</p>
<p class="word">
<span class="form">da</span>
<span class="gloss">there</span>
</p>
</div>
<p class=translation>Get out of my way!</p>
</div>
If we just stick that into a page without any CSS
on it, we get this rather weird-looking thing:
pahigád kamo da
pa-higád CAUS-move
kamo 2PL.ABS
da there
Get out of my way!
It’s not awful, but it’s not standard notation.
The little demo below demonstrates how the presentation can be fixed. Fixing the wrapping behavior comes down to applying a single CSS
rule:
.word {
display:inline-grid;
margin-right: 1em;
}
So, what the heck does that mean? I would suggest that for now, if you don’t have experience with CSS
, you simply ignore the question of what the rule “means”. Rather, focus on a different question: what the heck does that do? This rule is enough to get sensible interlinear formatting that matches the way we are used to thinking about them.
To help you get a feel for what the CSS
is doing, I made this little demo for you to play with:
Here are some things to try. If you toggle option #1, you will see borders added to every tag in the markup. Notice:
- Every “tier” for each word — the form and the gloss — has its own tag (a
<span>
, in this case). - There are boxes around each word — this is important! We need this tag to serve as the target for the rule setting the
display
andmargin-right
properties, as shown above. - There is a box around the whole list of words. This corresponds to the
<div class=words></div>
tag in the markup.
Which particular tags we’re using is not as important as the nesting pattern. In other words, it’s not as important that we chose to wrap the “form” bits of the interlinear in <span>
tags (as opposed to <p>
tags or something), as the fact that each form (and each gloss) is inside something that corresponds to a word. Here’s the hierarchy:
- sentence
- transcription
- words
- word
- form
- gloss
- word
- form
- gloss
- word
- form
- gloss
- word
- translation
It’s easier to see the benefits of this pattern if we consider a (much) longer sentence, like the one below,
Notice that if you resize your browser (or if you’re already reading this in a phone), the words wrap correctly. This is not a trivial feature: it’s very important, especially when we consider that many of the communities documentary linguists work with have limited access to large-screen devices, but stable access to handheld devices.
There’s more to be said about this, but in general, if we stick to this tag hierarchy pattern, we can do lots of cool stuff to modify it with CSS
.