Notional Slurry Logo

Measuring differential attention in a distributed volunteer community

Distributed Proofreaders (DP) is an online community of several hundred volunteers who scan books and magazines that are the public domain, and then collectively correct character recognition and formatting errors made by automated OCR software, producing high-quality text and HTML files that are released into the public domain — for free — at Project Gutenberg.

A “project” at DP is typically a single work: a book, a magazine number, a small portion of an encyclopedia, a monograph, a song. The complex DP workflow moves these projects through a series of well-defined stages, with tasks ranging from the scanning of the physical work by individual content-providers, through two separate rounds of (text) proofreading of the individual pages of the work, two further rounds of formatting to capture details of typographic and mid-scale semantic structure, and finally a number of rounds of post-processing to re-integrate the separated pages into a single coherent work. Only after they’ve passed through the “hands” of dozens of volunteers are projects considered complete and sent to the archives for public distribution.

In the proofreading and formatting stages of the workflow, there are several hundred works in progress at a given time, each of which may have several hundred pages. Volunteers can choose whichever active project they prefer to work on, one page at a time. Thus, at any given moment there is a wide variety of works in process, in many languages, of many styles and in numerous domains and categories, aimed at many different audiences and age levels. Each project poses unique challenges and rewards to the project managers, the volunteers, and the eventual consumers.

In the administrative sense, the flow dynamics of projects in DP is controlled by a suite of complex parallel and sequential queues governed by numerous rules, and projects are pushed and pulled in various ways through the different stages. But at its core, the flow rate of projects through DP is driven by the collective preferences of the volunteers who pick pages — each according to her own criteria and ability — to work on. Through the interaction of the driving force of collective interest with the underlying structure of administrative rules, some projects languish for months or years without progressing, while others race through the stages of DP in a matter of hours.

Not surprisingly, the reasons for these differences are not simple.

Projects in DP differ widely on the basis of many structural features — the number of pages, the length of the pages, the number of illustrations and tables. And quality features — the clarity of the scans and the accuracy of the OCR, the presence of automatically-produced HTML markup. And matters of content — language, fiction vs. nonfiction, technical vs. narrative work, religious content, &c. And there are the ill-defined “marketing” factors that most affect their relative interest to the volunteers, that might drive somebody to choose one over another to proofread. Thus, a novel that moves through the system quickly could be doing so because the pages are short, or because the OCR is especially clean, or because the story is engaging, or because the author is famous. An essay on linguistics might take weeks to complete a stage simply because it concerns an uncomfortable topic, or the pages are inordinately messy, or there is a single difficult table “blocking” the volunteers’ progress.

The average speed at which work passes through the DP system has been remarkably constant over the last year, at least in terms of projects completed. But there is a growing problem with the workflow: the queue of projects waiting to enter the second stage of proofreading is already dangerously large, and there is no evidence that the community will be able to compensate without a radical structural solution or gross change in policy. While a number of discussions are underway about how best the alleviate the backlog, experience conspires to undermine the collective confidence in any one of them: the current system was put in place nearly one year ago to address other aspects of unbalanced workflow. More importantly the system of controls and queues is so complex that it defies understanding and simple control.

To that end, my wife Barbara and I are planning to build a detailed model of the DP workflow, at a sufficiently fine scale to capture every important aspect of the site’s dynamics. With those models, we’ll be able to really explore and understand “what if” scenarios: changes in the dynamics of queues, changes in the balance and even the quality of work. Over the next few months, Barbara and I will report our progress on this overarching project.

That said, we’re left with an important job to do beforehand. To compare one model or policy with another, we first need to understand not just what’s happening but also what’s wanted. We need to establish a set of benchmark measurements.

“Attention is being paid”

We’re not concerned with the relative contribution or performance of any volunteer or project manager, nor should we be. The diversity of volunteer interests, abilities and intentions is what makes the Distributed Proofreaders community so effective and strong.

Rather, the family of Project Managers and administrators — and also the eventual consumers — is much more concerned with the flow of projects through the system. Any Project Manager prefers to see their work move ahead quickly. Any administrator prefers to see backlogs emptied as much as possible. Any interested reader wants to read the book as soon as possible, and they can’t until it’s released to the public.

Throughout the DP community there’s clearly already a broad appreciation of this. “Special Days” have been set up to ease the release of works that would otherwise wait in queues. The first project of any new Project Manager is expedited. Ad hoc teams of volunteers have arisen that specialize in completing works that have spent the “longest time” in the system. More crucial, the current backlog of projects waiting for the second round of proofreading is in itself no problem, except insofar as it threatens to delay ongoing projects too long.

In a way, all these are adaptations to bring extra “attention” to projects that are otherwise being missed. Work only gets done by voluntary attention from the community members. While every individual wants most to do what’s fun and entertaining, at the same time there’s a real sense that the emergent progress over all projects should be equitable and in some sense balanced.

That it takes all kinds, in other words.

From an administrative standpoint, any comparison between alternative policies and structural changes should take into account the effects they have on both the individual and overall progress of projects.

Current practice for estimating the progress of projects tends to focus on the number of projects being released into and out of each stage. But as I mentioned above, this ignores the diversity of work underway. Is it preferable to release many small novels at the expense of important — but difficult — larger reference works? Anecdotally, the “fastest” projects are the smallest and simplest ones. Because there are more of them, and they get read faster, they often bypass larger more scholarly works in the system… and thus they tend to form a large fraction of the end product released.

In the spirit of preserving diversity, is this the most desired outcome?

Quantifying the degree of attention

What I’m proposing here is a method of measuring the relative attention paid to projects. One that takes into consideration the many diverse attributes of those projects: the language, clarity, difficulty, scale, and all the other qualities that must be balanced to compare them equitably. And equity is the important point, of course. There is a tacit but strong sense among the volunteers is that everything in process should move on, no one at the expense of any other.

The overall DP workflow is surprisingly complex. Because there is a pressing need to address it right now, I’ll focus in this work on projects passing through the second proofreading (P2) stage. It’s hoped that by developing this approach there, we can later derive a more general approach for balancing flow across the entire DP system.

Briefly, I’m proposing to apply a mathematical technique called Data Envelopment Analysis (DEA) to compare projects that have completed the P2 stage. DEA allows the analyst to understand especially complex problems without over-simplifying effects or throwing away data that gets lost in regression and many other statistical methods.

In framing a DEA model of P2 projects at DP, I’ll collect measurements that are roughly divided into a set of “inputs” and categories, and a set of “outputs”. “Inputs” and categorical variables for this analysis should include traits of the projects (not just the works, but the DP projects) themselves: the number of pages in the work, the number of characters on each page, the language, the number of different proofreaders working in the two proofreading phases P1 and P2, the amount of poetry or mathematics involved, &c. “Outputs” in this case should reflect our standardized measurement of “productivity”. I’ll propose we look at the number of days needed per page completed in the P2 stage.

DEA in this context is an exploratory, not a prescriptive technique. While the technical details aren’t important here, the gist is this: DEA is first used to identify the subset of projects that are processed fastest compared to others that “use” the same amount of the “inputs”; we’ll call these the ones receiving “maximum attention” from the volunteers, but in DEA parlance they’re often referred to as “efficient”. Moreover, DEA provides a measure of how “inefficient” the other projects are; here we can think of this as a measure of how much less attention they’re getting, compared with other projects with comparable characteristics.

I’m being careful not to talk about “efficiency” as such, mainly because it’s a term of art in this context. This isn’t some sort of hare-brained proposal to “improve efficiency at DP”. Rather, it’s a way of measuring what’s actually getting done. My hope is that by measuring it, we (as analysts, administrators and volunteers) can look at the process and learn more about what actually happens in DP. While the biggest benefit of distributing work among many individuals is the “wisdom of crowds” effect we read about in the scientific press these days, the negative side is that crowds as such aren’t that good at communicating the finer details of self-knowledge.

Useful outcomes

This project offers at least four benefits I can see.

First, it can be the rational framework we use to compare alternative policies. This will be especially useful in a few weeks a modeling framework comes along that allows simultaneous exploration of numerous alternatives. DP is extraordinarily complex, and understanding the shifts and flows and their effects on productivity and quality is difficult even for small segments, let alone the whole system.

Second, labeling each active project with a number called “attention being paid to this project” could provide important information for project managers and administrators, even within the current system. My analysis may highlight ways individual PMs can manage and fine-tune the progress of their own projects (even at the expense of others’), or it could spark the invention of marketing and incentive systems to help drive volunteer attention one way or another.

Third, this might focus the collective attention of the volunteer community as a whole. We’ve all heard to old saw: “What gets measured gets done.” As I’ve pointed out above, I think there’s a consensus that equity is good. If only the collective knew what progress was being made, they might be able to better collectively balance it.

Finally, say it works. If a robust automated system can be built that helps with this sort of “balance”, it might save DP. There are going to be more volunteers all the time, and more content and more projects waiting to move ahead. The present setup strains the collective management skills of the current community; witness the backlog of P2 projects. If we incorporate this sort of self-awareness into the automated infrastructure to better steer projects more transparently and efficiently, we’ll surely benefit over the situation with the current suite of ad hoc rules that are frequently overridden or “gamed”.

I’m sending requests for the necessary data (and a duplicate of this post) to the DP website itself, but will be happy to discuss the progress of the analysis here or there.

There aren’t a lot of collective collaborative production-oriented communities yet. I’d be interested in hearing anything you might know about other examples.

Vicky said,

April 21, 2006 @ 6:00 pm

I’ve only been proofreading at DP for the last week and didn’t know that a large amount of projects are waiting to enter the second stage of proofreading.

Do you think that very many people who are qualified to proof in the second round might not be doing it for one reason or another? I know that I was thinking that I might not feel ready to do that when the time comes. But I guess people should be doing this if they’re able to.

Also, I thought there were thousands of volunteers and not hundreds, as you state above.

Thank you for an interesting website and all the great work you do!

Tozier said,

April 21, 2006 @ 8:43 pm

Vicky, you can see the projects waiting to enter the four main phases at DP through the DP release queues page (login required). When I wrote this entry, the P2 queue was well above 1500 projects. Since we’ve started the “p2alt” experiment, it’s only dropped by a couple of hundred.

I know I don’t do P2, and I’m not sure I can justify it rationally — there’s not really that much different between the phases, and I was qualified to do Round 2 proofing under the old system (proofreading and formatting used to be combined). I find it very hard to do P2 without throwing a lot of formatting in, still. And since that makes some rules lawyers mad, and rules lawyers make me mad… well.

You’re right about the thousands count, in one sense. But the number of pages done is something like a lognormal distribution: the vast, vast majority of volunteers have proofed less than 20 pages.

ellenweber said,

May 31, 2006 @ 5:36 pm

It is so true that what gets measured gets done and I like the way you connected measurement to improved alternatives here…. I would love to hear more about how this work will help us to ensure better models for diversity… and I would also like to see more of your good work come to the public forum for good discussion. What an interesting blog and I should think there’d be valuable input from voices from many cultures…What do you think?

RSS feed for comments on this post · TrackBack URI

Leave a Comment