Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder how hard it would be to run a couple pages of dense print (though in a monospaced and consistent format) through an OCR system.

I might play with Tesseract[1] this weekend and see if this is even a feasible idea. If so, it makes the paper key storage a lot more palatable.

[1]:https://github.com/tesseract-ocr/tesseract



On several separate occasions I tried to use Tesseract to OCR base64 and similar text that was printed in a normal monospace font (i.e. not a special OCR font) and scanned.

I never got even close to getting useful results. I tried limiting the alphabet, disabling language models as far as I could and at most I could get a few recognizable character sequences right out of the whole page. I got the impression that the whole thing very much depends on being able to split text into English words and have easily separated paragraphs.


Well, crap. Looks like QR codes are the better idea, then. Thanks for saving me a boatload of time!


there are fonts that were designed specifically to be easily OCRed with high reliability. look for "OCR A".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: