OCR'ing line printer listings - Printable Version +- HP Forums (https://www.hpmuseum.org/forum) +-- Forum: HP Calculators (and very old HP Computers) (/forum-3.html) +--- Forum: General Forum (/forum-4.html) +--- Thread: OCR'ing line printer listings (/thread-19550.html) |
OCR'ing line printer listings - artag - 02-12-2023 08:02 PM I'm trying to OCR the hp9815 ROM listing from the patent (4089059) in order to make something searchable, and ideally re-assemblable as a sanity check. I've used http://www.onlineocr.net and the results are pretty good (at least compared with the others I tried) but there are still many corrections to be made as well as formatting differences. I think it's probably many times easier than retyping manually, though. As I work through the corrections, In see many similar character recognition errors, but there are many hints that could perhaps be automated. - Line printer font, whilst often broken, does degrade in specific ways - The line number should be contiguous - The addresses increment in a predictable manner - There is a correlation between data and assembler symbols - The opcodes, register names etc. are from a limited set All these would help improve accuracy, but the OCR system is trying to recognise natural language in one of several human languages (selectable). Ideally, it should have options for at least font and assembler syntax. Is there an OCR system that's been specifically trained on lineprinter output or even assembly code ? RE: OCR'ing line printer listings - toml_12953 - 02-13-2023 10:36 AM (02-12-2023 08:02 PM)artag Wrote: I'm trying to OCR the hp9815 ROM listing from the patent (4089059) in order to make something searchable, and ideally re-assemblable as a sanity check. Have you tried Adobe Acrobat (full version, not reader) Its OCR is very good with line printer output, depending on the quality of the scan, of course. |