New OCR Model: olmOCR

lgessler · February 27, 2025, 4:43pm

It’s been a really exciting time lately for low-resource OCR! AI2 released a cutting-edge OCR model called olmOCR a couple days ago.

I tried it on a page from a Garo-English dictionary:

Despite numerous challenges (mixed Bengali/Latin script, word-internal dots in Latin script) I think it does a fantastic job transcribing it–here’s its outputs with no edits:

abaku

a’bachenggipa, (dakchen ningpa ; gital). a beginner ; a novice ; a pioneer ; abori- gines ; primitive ; origin ; fundamental ; primyrya. aboriginal ; primitive ; be- ginning. a’-ba chenmting. inchoate. just begun ; rudimentary. (বিল) শিক্ষা করা। (বি আশা বিন্দ আবহাও।…সত্তা বিন্দাক্ষর ; উপাধিবিন্দ ; বিন্দ)।

A’bakun, n. (dakgni ; kam ; kamtang), a duty ; a work ; means of livelihood. v. A’bakun gaa, (kam janga ; man’oroa). to progress ; to improve. n. progress ; improvement A’bakun gaata, (kam jangata), to further ; to promote ; to help forward ; to assist. (বিল) অধ্যয়ন ষোলিত। (বিল) উপাধিবিন্দ। A’banggir, n. (gamtchi gipa ripok jatsa), a blue crystal. bead necklace. নাবি কৃত্তিকমিন হার।

Abastro, n. (tinggro, longitude, আব্যান)। A’se, v. (A’a be-waka ; dare ruu), to fall off, as land ; to be detached and fall. n. landslip ; falling of land. (বিল) নালি চড়া, নালি চড়া। (বিল) পুরা-তলানা।

A’bel, n. (A’dubak ; dipo A’taru), deep mire; mud ; bog ; soft ground ; slough (বিল) কৃত্তিকমিন। লোগো কৃত্তিকমিন।

A’belati, n. (a-chi gipok), clay ; china clay. (বিল) আচিকমিন। চীনা কালি।

A’ben, n. (A’ara ; ro’onggi a’n), loam ; soft earth ; stoneless earth, (বিল) পলন; কৃত্তিকমিন।

A’bang, n. (A-chik songi salgpengi jo don gipa A-chick hende). Garos living in the southe rn portion of the Garo land ; A’bang aro. the region where A’bang man lives. (বিল) নাবি ভাষার পক্ষে শাস কবা চীনা ভাষার সম। তাদের মা।

Abet, n. (kori ; betbetgipa ; agandipetgipa), a talkative man ; an evil. -a. garrulous; talkative (বিল) কথা প্রকাশ করেন। (বিল) ক্ষতিকারক। n. Abet-Rangge, an evil spirit হালি নিদাশী। adv. abetabet, (jingring). repeatedly ; frequently ; unnecessarily. (বিল) চীনা চীনা অপরাধ।

Abi, n. (abitang abigipa) an elder sister ; my elder sister ; (বিল) কথা। কথা কথা ano-abi. (noabi), sister ; কথা কথা nouab, a cousin sister. নুবো বা কন্যার আনুগত্য।

A’bibia, n. (a’ani biba ; a’oso), bad smell, that emits from the decomposed matter in earth ; atmosphere, (বিল) দাঁতি ধার বলিয়া দেয়। পুরুষ কথা মারু।

A’bibak, n. (a’banak ; a’ganguri), the north or south pole. (বিল) উত্তর বা দক্ষিণ দেয়।

A’bibol, n. (a’bima), the globe ; the Earth ; the World ; a ballshaped object ; (বিল) লোকাধি পৃথিবী ; পৃথিবী পৃথিবী।

A’bibrom, n. (a’ba a’gilsakai ga’brong), the exis of the Earth. (বিল) পৃথিবী অফ।

A’birding, n. (a’anding ; a’rikging ; a’ginda reng), ridge ; range. ডাব। (বিল) ফায়র আদিন।

Abigipa, n. (abi ; abitang, an elder sister ; my elder sister. (বিল) কথা কথা।

A’ibka, (a’-bi-ka), n. (a’galbang a’knings the heart of the Earth. (বিল) পৃথিবীর পৃথিবী।

Abilik, n. (bilik jatsa), a genus of bean. (বিল) পৃথিবী অফ।

Abim, n. (jongdanggap jatsa), a kind of worm, that can roll itself into a ball. (বিল) সাত সাত। পুরুষ পুরুষ। এতেব পুরুষ।

There are still problems, of course. Looks like the Bengali-script output is not very reliable—in the last line, for example, the output seems to be a very poor match for what’s in the image.

cbowern · March 1, 2025, 12:51pm

Thanks! I’m looking forward to trying this. I tried some (what I thought was straightforward) material with ChatGPT and it was laughably disastrous

fauxneticien · March 4, 2025, 5:06am

@lgessler — any interest in demoing this for March tinker Zoom?

lgessler · March 4, 2025, 3:59pm

sure, would love to!

larp · March 4, 2025, 5:26pm

It attempts to OCR tables, which is pretty impressive.

IXF 1. Agemmay

Isekkilen n tmaziyt d wi :

asekkil	isem-is	tifinay	amek inţeq	amedya
a	a (ney : ayra)	Φ	a	aman
b	ba		b	bib
c	ca	če	ch	bru
d	yeč	d	tch	amcic
d	da	dh	d	ečč
e	đar		dadda	dadda
f	ilem		adar	adar
g	fa		els	eler
gw	gaw		argaz	argaz
ġ	yeğ		gma	gma
y	γar		agwad	agwad
h	ha		egğ	egğ
h	him		iyi	iyi
i	i (ney : iγri)		hud	hud
j	ja		imi	imi
k	ka		ajenjar	ajenjar
			rki	rki

Tamasheq has 33 consonants, featuring six manners of articulation and eight places of articulation. There are no non-pulmonic consonants. The consonants are detailed in the table below.[4]:23

	Labial	Alveolar	Palato-alveolar	Velar	Uvular	Pharyngeal
		plain	pharyngealized
Plosive
voiceless	(p)	t	tf	k	(q)	(?)
voiced	b	d	dj	gi	g
Fricative
voiceless	f	s	s̈	χ	(n)	h
voiced	z	z̛	ž̛	k̚	(…)
Nasal		m	n
Liquid			l
rhotic			r
Approximant		w	j

lgessler · March 6, 2025, 11:01pm

Another OCR drop! Mistral OCR | Mistral AI

skalyan · October 25, 2025, 8:04am

Just came across this newer OCR model, which claims to support 90+ languages: GitHub - datalab-to/surya: OCR, layout analysis, reading order, table recognition in 90+ languages . Haven’t tried it myself.

dvanesch · October 26, 2025, 10:39am

A few more OCR models have been coming out over the last few weeks. I haven’t tried any of them yet, but it might be interesting to do a larger-scale benchmark at some point in a langdoc context, doing a comparison of a selection of the models in this thread (and that could also be a cool ComputEL paper)

nanonets/Nanonets-OCR2-3B · Hugging Face
deepseek-ai/DeepSeek-OCR · Hugging Face
GitHub - PaddlePaddle/PaddleOCR: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. (new PaddleOCR-VL 0.9B model)

And some layout parsing models:

dvanesch · October 26, 2025, 12:07pm

HF has a blog post and some inference cost comparisons for some of these models: Supercharge your OCR Pipelines with Open Models

And I found one more that looks interesting: Logics-MLLM/Logics-Parsing · Hugging Face

I found some notes on how to get DeepSeek OCR set up locally: Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

Another fairly recent model seems to be IBM’s Docling: @sungkim.bsky.social on Bluesky with a demo here: granite-docling-258M demo - a Hugging Face Space by ibm-granite

cbowern · October 30, 2025, 5:56pm

medieval-data (Medieval Data) also has a full page OCR approach which seems to work reasonably well for Latin at least