Pair “programming” and book production

An inter­est­ing jux­ta­po­si­tion this New Year’s morning.

On the one hand, over on the XP mail­ing list there’s been a revival of dis­cus­sion of the ben­e­fits of pair pro­gram­ming, with some very inter­est­ing side-​​trips into soft­ware project billing prac­tices, the psy­chol­ogy of devel­op­ers, and developer–management rela­tions. If you haven’t been exposed to the old chest­nut, it goes roughly like this: Why should we waste twice as much money to have two pro­gram­mers sit in front of one key­board? To which the quan­ti­ta­tive answer is that the net pro­duc­tiv­ity of soft­ware devel­op­ers using pair pro­gram­ming (but per­haps not the entire XP prac­tice set) is 15% higher than the two pro­gram­mers work­ing alone — tak­ing into account the time spent cor­rect­ing their errors and bugs. Nobody’s actu­ally man­aged to mea­sure the time and effort dif­fer­ence expe­ri­enced by a full-​​fledged XP team vs. a cubicle-​​blocked tra­di­tional base­ment cow­boy effort, but I expect the num­ber will be larger.

The fal­lacy of the orig­i­nal queru­lous com­plaint shows igno­rance of what pro­gram­ming is: the speaker equates soft­ware devel­op­ment with typing-​​and-​​mousing, and elides the craft of it. The same fal­lacy leads to the shod­di­ness of most aca­d­e­mic soft­ware writ­ten by iso­lated stu­dents, and can be heard among many freshly-​​graduated pro­gram­mers who view pair­ing as a ball-​​and-​​chain rather than a boost to insight­ful thought. (Which devel­op­ers, appar­ently, think all other pro­gram­mers are all worse than they are, and thus will only slow them down. This is a fully com­mu­ta­tive rela­tion; the other devel­op­ers think that of the first one, too.)

My point being: pair­ing leads to no sig­nif­i­cant loss in pro­duc­tiv­ity, tak­ing into account sub­se­quent time spent on error-​​correction.

Now, on the book-​​digitization front, there’s a slow-​​burning dis­cus­sion involv­ing Double-​​Key Entry crop­ping up both on the Dis­trib­uted Proof­read­ers’ threads [sub], and the gutvol-​​d mail­ing list [sub]. Many stud­ies through the years have shown that hav­ing two (skilled) typ­ists type­set a man­u­script into two par­al­lel ver­sions, and then “fold­ing together” their work prod­uct, results in sig­nif­i­cant reduc­tion in errors. This makes sense, inso­far as what we’re mea­sur­ing is pure qual­ity of the end prod­uct. It’s hard to cal­cu­late the total effort expended, though: surely a third per­son is required to com­pare and col­late the dif­fer­ent ver­sions, hope­fully in light of the orig­i­nal man­u­script as well.

Worse. Except in extra­or­di­nary cir­cum­stances, peo­ple don’t type elec­tronic texts: they use OCR soft­ware to dig­i­tize them, and base these on a fixed set of page image scans. From exten­sive expe­ri­ence (I con­ser­v­a­tively count 14000 pages per­son­ally scanned and OCRed to date), repeated OCR with ABBYY FineReader Pro pro­duces iden­ti­cal results: the OCR algo­rithm is deter­min­is­tic. Thus, par­al­lel OCR can­not change out­comes, unless dif­fer­ent algo­rithms are brought to bear. Rather, par­al­lel scans need to be used to reap the ben­e­fits of double-​​key entry in digitization.

That said, dig­i­ti­za­tion in some pri­mor­dial sense seems to involve less craft than typing-​​and-​​mousing: You got your image, you run your pro­gram, you get your text.

There’s some­thing miss­ing there, isn’t there? Some­thing that cur­rently takes the efforts of five to seven var­i­ously skilled and vet­ted vol­un­teers to inject into the work prod­uct. OCR does not — and in many cases I’ll argue can­not — pro­duce an elec­tronic edi­tion of a book. Advanced OCR soft­ware can suss out lay­out, type­face, and some fancy ones can man­age to sense the dif­fer­ence between a thought-​​break and a sec­tion end­ing. But the human vol­un­teers not only cor­rect tran­scrip­tion (“syn­tac­tic”) errors, but also cap­ture the seman­tic con­tent of the work.

Dig­i­ti­za­tion, at least to the stan­dards we tac­itly use at Dis­trib­uted Proof­read­ers, takes craft as well. It’s a lot more than typing.

And so I find myself won­der­ing what work prac­tices might com­bine the strengths of double-​​key-​​entry-​​plus-​​serial-​​improvement and dili­gent pair “pro­gram­ming” (transcription).

I haven’t got it yet. Just thinking.

This entry was posted in Uncategorized by Tozier. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>