Proposal: On Closing the Set

Over at the Dis­trib­uted Proof­read­ers project, our goal is to accu­rately cap­ture, present and dis­trib­ute the con­tent of printed works in the pub­lic domain. Unlike the land grab efforts of Google Print and the like, our texts are [sup­pos­edly] bet­ter, more accu­rate, truer to the orig­i­nal pub­lished form. We rely on OCR to read the scanned text, but unlike those oth­ers we also acknowl­edge that OCR is fal­li­ble, and that cer­tain typo­graphic con­ven­tions that con­vey sub­tle meanings—line breaks, and em-dashes—need to be preserved.

One of our num­ber, Jon Niehof [a.k.a. jnik], has a great and use­ful idea whose time has clearly come. Some time back, he began col­lect­ing the lists of books (or, more gen­er­ally, titled works) men­tioned in the works we’ve scanned and proof­read. The point for us work­ing in the DP com­mu­nity is of course to com­plete the set, to cre­ate a mov­ing front from which the next books for scan­ning can be chosen.

And at the same time to cre­ate a self-​​consistent record of literature’s explicit relationships.

And at the same time to cre­ate a dataset to record a novel “social” net­work, which is at the moment a sub­ject of some inter­est.

With­out With Jon’s per­mis­sion, I’m going to sug­gest it’s time to take it out of DP and get the com­mu­nity most inter­ested involved. In DP we are over­whelmed with work, and the community’s con­ver­sa­tion cen­ters around how to get the work­flow slimmed down, not extended with­out hori­zon. [That said, please con­sider going and giv­ing it a try. You will be help­ing a unique vol­un­teer effort that cap­tures all the good of the land grab­bers, and can have a say in how it moves. I would con­sider sign­ing up, and proof­read­ing five pages, to be your way of acknowl­edg­ing the fact that you’ve read this piece.]

What is needed, I think, is a special-​​purpose wiki, seeded with some start­ing point. Users could add works cited, men­tioned, adver­tised, or oth­er­wise appear­ing in oth­ers. And by works I mean not merely nov­els and tech­ni­cal mono­graphs, but cat­a­logs, reviews in mag­a­zines, and per­haps ulti­mately news­pa­per columns.

Con­sider the ben­e­fits that could arise. First, it would forma sort of table of con­tents or direc­tory, since of course any title could even­tu­ally be linked to scans or Guten­berg edi­tions of the actual work. Sec­ond, there’s that net­work, that record of what appears where. Third, it will give me an excuse to do some­thing I’ve been putting off for some time (and which Jon once fret­ted would swamp his lit­tle inter­nal DP effort): Scan and upload our recently-​​purchased copy of Allibone’s A crit­i­cal dic­tio­nary of Eng­lish lit­er­a­ture…, which with its sup­ple­ments men­tions well over 130,000 works.

So easy to do. One wiki, slowly accreting.

But me, a mere stu­dent, a first-​​year grad­u­ate stu­dent in of all things engi­neer­ing? Hah. They wold drum me out for being dis­tracted by such non-​​mathematical triv­i­al­i­ties, of not being “seri­ous” about my stud­ies, of hav­ing my nose well away from the grind­stone and gaz­ing off towards left field. Those meanies.

And think how many brownie points some­thing like this would bring to a pro­fes­sional, a real live scholar of lit­er­a­ture, or of net­works, or of prac­ti­cally any­thing not involv­ing lin­ear programming?

It is yours. Please. Go right ahead. I will send along our Alli­bone as soon as it is ready.

Update [20 Jan 2006]: Jon Niehof should get credit for his great idea of Clos­ing the Set.

This entry was posted in Uncategorized by Tozier. Bookmark the permalink.

2 thoughts on “Proposal: On Closing the Set

  1. It seems to me that there are two types of set here, though. Allibone’s is more of a cat­a­logue — it doesn’t cite the books, it only records their exis­tence (and some­times gives third-​​party reviews of them). The other type of set is like the one in Clouston’s Flow­ers from a Per­sian Gar­den, where he ref­er­ences dozens of other books to sup­port and expand on his topic.

    Just because you have a paper indexed in Cite­Seer doesn’t mean Cite­Seer cites you, does it? So why would Allibone’s be a good start­ing point?

  2. True. In fact, there seem to be mul­ti­ple ways that other works are men­tioned, too. Imper­a­tive (“Go read this”), cita­tion, in pass­ing, indi­rectly. And var­i­ous lev­els of cer­tainty, as in, “Mrs. Oliphant has writ­ten exten­sively on…”, as opposed to, “In Mrs. Oliphant’s ‘The Count’s Daugh­ters’.…” &c

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>