💻 What format should I store data in?

Sandra · April 21, 2020, 9:06am

Re OCR: There are python packages that can OCR and extract text to a table. I didn’t end up using them, because the OCR just didn’t work well with all the diacritics and a informal study revealed that it was faster if my student assistant just types it up vs. doing OCR, extracting the text, and then hand-correcting it. But if you have high quality PDFs with little diacritics it might be worth it. Also note that not all OCR is created equal. The free ones are not as good as paid options, unfortunately.