What the heck is a Private Use Area?

pathall · September 2, 2022, 2:51pm

Here’s something that I have been learning about recently that I thought might be worth a quick writeup: the Private Use Area of Unicode.

The pocket summary of the core of what Unicode is is actually quite simple: every “letter” or “sign” in every writing system on earth (and beyond!) should have a unique identifying number. Unicode is a standard list of such identifiers. Typically, Unicode identifiers (called codepoints) are represented as Hexadecimal numbers. But that’s just because hexadecimals are easier to deal with in computers for various computer-people reasons. The codepoints could just as easily be represented as more familiar decimal numbers. The point is that every letter is assigned a unique, standard codepoint as an identifier.

Unicode is always changing: there are plenty of scripts that haven’t been encoded yet. And there is a defined process for doing so (writing proposals and going through a submission process).

So what is the Private Use Area? I guess an easy way to explain it would be to say that it’s a “Standardized Non-Standard” subset of Unicode numbering. The idea is, sometimes you need to work with a particular set of symbols that no one else outside of a certain circle of people will ever see, and which is explicitly not standard.

Unicode has set aside a range of codepoints for this purpose. Those codepoints are permanently designated as outside of any future developments to Unicode. It’s a “use at your own risk” part of Unicode, because if you create a font that maps glyphs onto those codepoints, you can’t be certain if you (say) publish a web page using that font on the web that someone else might also be using that range for something completely different.

There are valid reasons for doing this. @clriley and other colleagues and I have been working on the syllabary for the Loma language of Liberia and Guinea, which it is hoped will some day be part of Unicode. The hope is that some day the syllables of the Loma syllabary will have non-PUA codepoints, and thus become standardized.

But in the meantime, for the purposes of development and testing, it’s useful for all the people working together to have a font which encodes the syllabary using PUA codepoints: that way we can look at the glyphs, build tables and test documents, etc, until the (we hope) the submission ultimately succeeds.

skalyan · September 3, 2022, 8:39am

Great summary of the PUA and what it’s good for!

Let me know if you ever need help developing a font for the Loma syllabary; I have some experience in “fontifying” scripts that only exist in handwritten form (having previously done so for the Otomaung alphabet used by speakers of the Papuan language Naasioi). Though if you have a Unicode submission prepared, chances are you already have a font of some sort.