Thinking about hosting book scans online

I’ve been think­ing at length about how best to present scanned books online. I know a lot of peo­ple think PDFs are best, but they’re blasted dif­fi­cult to main­tain, and the qual­ity of search­able text depends on OCR… and OCR sucks in Adobe Acro­bat Pro (at least).

But I may as well give it a shot, as a base­line against which I can com­pare fur­ther efforts in other direc­tions. So I’ve arbi­trar­ily cho­sen the book I scanned last night, and put together into a PDF of page images (no text yet). If noth­ing else, it will make a good test bed for soft­ware I might try developing.

It’s a sin­gle issue of a mag­a­zine, rel­a­tively short (112 scans) but full of very very small type. It’s been down­sam­pled quite a bit by Adobe, so it may not be entirely read­able. Again, a good base­line from which to plan improvements.

Read and enjoy. It’s cur­rently being proof­read at Dis­trib­uted Proof­read­ers, and some­time in a cou­ple of years (!) it will be released into Project Guten­berg. Or maybe some­thing bet­ter will hap­pen to it in the meantime.….

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>