Web Standards for making Books: Paged Media and Generated Content (…which still don’t work in browsers)

Quick thought… It would be great if we could make books with HTML and CSS: I mean books with features like running headers and page numbers and footnotes and marginalia and all that good stuff.

The web, after all, is mostly based on the scrolling model — paradoxically enough, there aren’t really any pages in web pages. But print still matters, a lot: you already know that, but see @Hilaria’s awesome work creating kid’s books in Chatino — here, here and here! Just the amount of attention that this project has gotten is evidence that the demand and interest in print linguistic materials is deep.

Right now we have to rely on specific software for creating books. LaTeX, In Design, and many others. Which is of course great — whatever works.

However, there’s only one missing stepping stone between crossing the the river between web documentation and print documentation. And that stepping stone is, in a sense, already there, in the form of Web Platform standards. There are two, specifically that are relevant to this task:

Paged Media

https://www.w3.org/TR/css-page-3/

This CSS module specifies how pages are generated and laid out to hold fragmented content in a paged presentation. It adds functionality for controlling page margins, page size and orientation, and headers and footers, and extends generated content to enable page numbering and running headers / footers.

Generated Content

https://www.w3.org/TR/css-content-3/

Authors sometimes want user agents to render content that does not come from the document tree. One familiar example of this is numbered headings; the author does not want to mark the numbers up explicitly, they want the user agent to generate them automatically. Counters and markers are used to achieve these effects.

These are technical standards and probably won’t make a whole lot of sense if you don’t mess around with HTML and CSS a lot, so I thought I would describe them in layperson’s terms (at least as well as I understand them!).

Some of the things that are possible with this standard are apparent in this screenshot of a page from some guy’s recently completed dissertation:

There are several things to note:

  • “Leaders” — the dots between section titles and page numbers
  • Page numbering — note that it’s possible to do things like change from Roman numerals (for prefatory material) to Arabic numerals for main content

I didn’t do this in my dissertation (maybe I should have), but you can also add running headers that “know” whether they’re on the right or left page (“recto” and “verso”), so you can do that thing where the verso page has the author (say) and the recto has the section heading.

So this kind of thing:

These are especially important in dictionaries:

We need this kind of stuff!

The Problem

Web browsers don’t support it yet.

:unamused:

If they did support it, imagine creating some web documetation which, when you click “print” actually creates a book PDF!

I do sometimes wonder if organizations like the LSA might get into advocacy of web standards that have clear relevance to language documentation, reclamation, pedagogy, and so forth.

More resources:

This is from 2012, not much has changed. :-/

This is software which implements these standards as a stand-alone tool. (I used it to create the PDF version of my dissertation.) Works a treat, but there are commercial constraints on its use, and it ain’t cheap.

3 Likes

There’s an example of a paged dictionary written with HTML/CSS here (under “Dictionary”). Apologies if this is something that you showed me first, which I’m now showing back to you!

1 Like

Hi @skalyan yes I think we did talk about that one — it’s very impressive.

I do find the HTML version rather… weird. There are “page breaks” before every letter, but each letter section is rendered in two columns — which means finding a given word requires two scrolls through a whole section, which seems very impractical to me.

1 Like

But that’s easy to fix using CSS media queries!

(P.S. Props to you for consistently typesetting abbreviated names of programming/markup languages in a monospaced font!)

1 Like

Yes, or even just changing

.chapter { 
  page: chapter;
  columns: 2;
  column-rule: 0.2pt solid black;
  text-align: justify;
}

to:

.chapter { 
  page: chapter;
  columns: 1;
  column-rule: 0.2pt solid black;
  text-align: justify;
}

Which results in something like this:

Which seems much more usable to me online.

1 Like

Actually, as long as we’re talking about dictionary formatting… it’s kind of interesting to look at the markup on this Icelandic dictionary. It’s incredibly flat:

<p>
<b>auga</b> (gen. pl. <b>augna</b>), 
n. <i>eye</i>; lúka (bregða) upp augum, bregða augum í sundr, 
<i>to open (lift up) the eyes</i>; 
lúka aptr augum, <i>to shift the eyes</i>; 
renna (bregða, leiða) augum til e-s, <i>to turn the eyes to</i>; 
leiða e-n augum, <i>to measure one with the eyes</i>; 
berja augum í e-t, <i>to take into consideration</i>; 
koma augum á e-t, <i>to set eyes on, become aware of</i>; 
hafa auga á e-u, <i>t have, keep, an eye upon</i>; 
segja e-m e-t í augu upp, <i>to one’s face, right in the face</i>; 
unna e-m sem augum í höfði sér, <i>as one’s own eye-balls</i>; 
e-m vex e-t í augu, <i>one has scruples about</i>; 
náit er nef augum, <i>the nose is neighbor to the eyes</i>; 
gløggt er gests augat, <i>a guest’s eye is sharp</i>; 
mörg eru dags augu, <i>the day has many eyes</i>; 
eigi leyna augu, ef ann kona manni,
 <i>the eyes cannot hide it if a woman loves a man</i>; 
(2) <i>hole, aperture</i> in a needle (nálarauga), 
in a millstone (kvarnarauga) or an axe-head; 
(3) <i>pit</i> full of water.
</p>

Basically there is <i> for English text and <b> for headwords, and that’s it. Consequently, you end up with a pretty inflexible “wall of text”, very similar to (albeit prettier than) the 1910 original :

auga (gen. pl. augna), n. eye; lúka (bregða) upp augum, bregða augum í sundr, to open (lift up) the eyes; lúka aptr augum, to shift the eyes; renna (bregða, leiða) augum til e-s, to turn the eyes to; leiða e-n augum, to measure one with the eyes; berja augum í e-t, to take into consideration; koma augum á e-t, to set eyes on, become aware of; hafa auga á e-u, t have, keep, an eye upon; segja e-m e-t í augu upp, to one’s face, right in the face; unna e-m sem augum í höfði sér, as one’s own eye-balls; e-m vex e-t í augu, one has scruples about; náit er nef augum, the nose is neighbor to the eyes; gløggt er gests augat, a guest’s eye is sharp; mörg eru dags augu, the day has many eyes; eigi leyna augu, ef ann kona manni, the eyes cannot hide it if a woman loves a man; (2) hole, aperture in a needle (nálarauga), in a millstone (kvarnarauga) or an axe-head; (3) pit full of water.

I don’t know about you but I find reading text like this particularly difficult on a screen. A more detailed, structured markup would open a lot of opportunities for alternate renderings online, where whitespace doesn’t cost money.

1 Like

This reminds me of a bit of related work discussed here: Dictionary XHTML Proposed Standard - Pathway - Pathway (Try the last link on the page first).