Notional Slurry Logo

Project sketch: FromOldCatalog.com

This is a software project; a simple web app. Ruby on Rails should do just fine.

Still waiting to do a real planning game session on it, but here are some general notes.

The main focus will be presenting scanned images of old booksellers’, auction and publishers’ catalogs. Visitors will be able to do the normal stuff you might expect from a basic web2.0 framework: page through the booklets, search for text, comment on pages, tag things.

Somewhere along the line OCRed or other text conversion will want to happen, but at the moment there seems to be enough value for the participants in simply being able to search the catalogs based on back-office OCR markup. In other words, no need to present the text itself; the OCR can generate searchable text blocks in preprocessing.

There are long-term social aspects, and a number of potentially useful follow-ons, but as of now the core project objectives will be (in no particular order):

  • Simplicity for uploading and administering catalog scans: things should just basically spoot out of ABBYY FineReader into the database interface, somehow.
  • Searchability. There should be useful words of some sort that people can search over, and expect to see pages from catalogs they’re interested in. All that stuff about false positives and false negatives will play out in the testing phase, no doubt.
  • Commenting. Folks should be able to say stuff. Not so much the bots, but folks definitely.
  • Usability. Not too stupid an interface. Simple. Not cluttered; not inaccessible; not frilly; not full of uselessness or Google ads or such. Just pages, and some little dabs of interface.
  • Flexible domain model. It should be trivial and obvious to amend or improve the text versions of the page scans, to add missing pages whenever they’re found and present partial works before then, to manage multiple editions or versions or issues of periodicals or volumes of sets.

More in a bit.

Leave a Comment