Post Reply 
DISPLAY - German complement to 65Notes + TI material [for our german reading fellows]
Yesterday, 08:46 AM (This post was last modified: Yesterday 08:51 AM by Martin Hepperle.)
Post: #16
RE: DISPLAY - German complement to 65Notes + TI material [for our german reading fellows]
Klaus,

thank you for the description of the process.

When I scanned the PRISMA journal, I used a similar sequence, but started with an office scanner/copier which directly gave me a PDF of the 2-up pages. I scanned in 400 dpi b/w to avoid fuzzy grayscale effects (tested printing on a b/w laser printer) and keep the file size within limits.

For splitting the pages and OCR, I used Acrobat professional, which offers a scripting engine where I could write a short Javascript function to split the pages from A3 into two A4 pages and another one for sorting the pages (as the original issues were A3 format stapled in the center and I scanned the complete A3 sheets).
Similar scripts were used to split the 4-pages-up documentation e.g. for the HP-75.

Nowadays I use a python script for similar jobs and the pypdf library which can also be used to split pages (like the attached one).
In my split routines for scanned, bound books I add some 5 mm extra space at the page centers, to compensate for the binding of books which I cannot cut.

I just found out that the commercial Foxit PDF Editor allows applying OCR also to a selection, i.e. a rectangular region created by dragging the mouse - I was not aware of this feature. This would leave graphics and other stuff intact and would not hinder later text searching in the PDF by stumbling across text fragments in images and listings.

The thing to avoid are OCR systems which replace the original text. For example the Foxit tools offer either "hidden text over image" (which is o.k.) or "replacing image by text" (which destroys the original).
I came across several older Epson printer manuals from Epsons official web site which were scanned and processed by Epson with the text replaced and these manuals contain horrible errors in technical content.

Martin


Attached File(s)
.zip  splitpages.zip (Size: 1,009 bytes / Downloads: 10)
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
RE: DISPLAY - German complement to 65Notes + TI material [for our german reading fellows] - Martin Hepperle - Yesterday 08:46 AM



User(s) browsing this thread: 6 Guest(s)