An interesting juxtaposition this New Year’s morning.
On the one hand, over on the XP mailing list there’s been a revival of discussion of the benefits of pair programming, with some very interesting side-trips into software project billing practices, the psychology of developers, and developer–management relations. If you haven’t been exposed to the old chestnut, it goes roughly like this: Why should we waste twice as much money to have two programmers sit in front of one keyboard? To which the quantitative answer is that the net productivity of software developers using pair programming (but perhaps not the entire XP practice set) is 15% higher than the two programmers working alone — taking into account the time spent correcting their errors and bugs. Nobody’s actually managed to measure the time and effort difference experienced by a full-fledged XP team vs. a cubicle-blocked traditional basement cowboy effort, but I expect the number will be larger.
The fallacy of the original querulous complaint shows ignorance of what programming is: the speaker equates software development with typing-and-mousing, and elides the craft of it. The same fallacy leads to the shoddiness of most academic software written by isolated students, and can be heard among many freshly-graduated programmers who view pairing as a ball-and-chain rather than a boost to insightful thought. (Which developers, apparently, think all other programmers are all worse than they are, and thus will only slow them down. This is a fully commutative relation; the other developers think that of the first one, too.)
My point being: pairing leads to no significant loss in productivity, taking into account subsequent time spent on error-correction.
Now, on the book-digitization front, there’s a slow-burning discussion involving Double-Key Entry cropping up both on the Distributed Proofreaders’ threads [sub], and the gutvol-d mailing list [sub]. Many studies through the years have shown that having two (skilled) typists typeset a manuscript into two parallel versions, and then “folding together” their work product, results in significant reduction in errors. This makes sense, insofar as what we’re measuring is pure quality of the end product. It’s hard to calculate the total effort expended, though: surely a third person is required to compare and collate the different versions, hopefully in light of the original manuscript as well.
Worse. Except in extraordinary circumstances, people don’t type electronic texts: they use OCR software to digitize them, and base these on a fixed set of page image scans. From extensive experience (I conservatively count 14000 pages personally scanned and OCRed to date), repeated OCR with ABBYY FineReader Pro produces identical results: the OCR algorithm is deterministic. Thus, parallel OCR cannot change outcomes, unless different algorithms are brought to bear. Rather, parallel scans need to be used to reap the benefits of double-key entry in digitization.
That said, digitization in some primordial sense seems to involve less craft than typing-and-mousing: You got your image, you run your program, you get your text.
There’s something missing there, isn’t there? Something that currently takes the efforts of five to seven variously skilled and vetted volunteers to inject into the work product. OCR does not — and in many cases I’ll argue cannot — produce an electronic edition of a book. Advanced OCR software can suss out layout, typeface, and some fancy ones can manage to sense the difference between a thought-break and a section ending. But the human volunteers not only correct transcription (“syntactic”) errors, but also capture the semantic content of the work.
Digitization, at least to the standards we tacitly use at Distributed Proofreaders, takes craft as well. It’s a lot more than typing.
And so I find myself wondering what work practices might combine the strengths of double-key-entry-plus-serial-improvement and diligent pair “programming” (transcription).
I haven’t got it yet. Just thinking.

