An interesting article on using the Transkribus to OCR 11th-13th C. Tibetan texts… in cursive manuscripts!
I’m not sure if this scan is from the collection in question, but it might suffice as an indication of the kinds of texts they’re dealing with:
Y’all. Transkribus is nuts.
The “CER” or “Character Error Rate”s reported are… a little amazing:
Model name | No. of pages checked | CER% for Training Set | CER% for Validation Set |
Model A | 40 | 1.39% | 4.28% |
Model B | 80 | 1.35% | 4.45% |
Model C | 120 | 1.18% | 4.73% |
Model D | 160 | 1.15% | 2.33% |
There’s a nice video about the project here: