# What does this remind you of, Nerdy?

This is just a tele­graphic note to get this out of my head and onto paper quickly.

I’m think­ing about crowd­sourced proof­read­ing, or rec­om­mender sys­tems, or any of sev­eral other prob­lems in bipar­tite net­work the­ory (to get all nerdy and crap). In gen­eral, sup­pose there are a col­lec­tion $M$ peo­ple, and each of them is look­ing at some sub­set of $N$ objects.

If they’re proof­read­ers look­ing at tran­scribed pic­tures of pages and check­ing for errors, if per­son $m_i$ checks and signs off a page $n_j$, then we con­nect node $m_i$ to $n_j$ with an edge.

If they’re web surfers look­ing at sites and tweets and shit and check­ing for awe­some­ness, if per­son $m_i$ vis­its and rec­om­mends a URI $n_j$, then we con­nect node $m_i$ to $n_j$ with an edge.

And so on.

We all know a lot of dif­fer­ent ways of scor­ing the things these peo­ple are look­ing at, don’t we Nerdy? PageR­ank, a bunch of net­work the­ory things, &c &c.

Me, I’m mus­ing about some­thing slightly dif­fer­ent. I think. I’m think­ing about con­fir­ma­tion, and col­lab­o­ra­tion, and how the step from one to two peo­ple vouch­ing for some­thing is “worth” more than the step from 17 to 18 peo­ple vouch­ing for it.

Let’s score things this way:

1. For each per­son, observe the set of objects to which they’re linked. Sup­pose for exam­ple that Ada linked to items #1, #2 and #3, Byron linked to item #2, and Charles linked to item #2.
2. Sim­i­larly, for each item, col­lect the set of peo­ple who are linked to it. In our exam­ple, item #1 is linked from Ada, item #2 from Ada, Byron and Charles, and item #3 from Ada.
3. The score of a per­son is increased by 1 for every item to which they link, to which a unique sub­set of peo­ple link. In our exam­ple, Ada links to item #1 (worth 1 point), item #2 (worth another point, because $\{A\} \neq \{A, B, C\}$), and item #3 (not worth more points, since item #1 is linked from the same sub­set of peo­ple). Byron’s score is 1 because he links to item #2; so is Charles’s.
4. The score of an item is [some sta­tis­tic of] the scores of the peo­ple linked to it. Maybe aver­age, maybe sum; doesn’t really mat­ter to me just now.

The inter­est­ing thing to me at the moment is under­stand­ing the dynam­ics of dis­cov­ery as a game here. Ada hasn’t got any way to increase her score with­out dis­cov­er­ing some item #4. If Byron links to items #1 or #3, he increases his score, and also Ada’s score indi­rectly, because she will then link to three items with dif­fer­ent “audi­ences”. But if Charles fol­lows suit, and links to the same item Byron does, that shared advan­tage dis­ap­pears for both of them. If every­body links to every­thing, the scores are eroded away to 1 across the board.

So what prob­lem am I try­ing to solve here?

In the anti­quated crowd­sourc­ing sys­tem of Dis­trib­uted Proof­read­ers, there have been numer­ous seri­ous issues with qual­ity and diver­sity through the years. Orig­i­nally proof­read­ers’ “scores” were just the num­ber of pages they read and signed off as being “cor­rect”; this of course led to click-​​through gam­ing that didn’t actu­ally improve the qual­ity of texts. But there’s still a lot of money on the table there, since I often see books in which a dozen con­sec­u­tive pages have been proof­read by the same pair of people.

This, to me, is a con­cern. Because of Scott Page’s work, among other things.

There’s a Goldilocks point here. If every­body does non-​​overlapping work, there is no com­mu­nity, and no chance for check­ing one another’s work. If every­body does every­thing, there’s a lot of redun­dancy in the work, and returns dimin­ish quickly. But some­where in between is a prob­lem of sub-​​community struc­ture: if two peo­ple con­sis­tently apply them­selves to the same work, there must be an accom­pa­ny­ing dimin­ish­ment in their joint contribution.

What I’m try­ing to do here is reward coverage.

Sim­i­larly, I’m con­cerned with the sus­tain­able qual­ity of links gen­er­ated by crowd­sourced sys­tems like deli​cious​.com or pin​board​.in—or even Google PageRank—as bots spew random-​​seeming links across the net­work. At the moment I have a search win­dow open for “genetic pro­gram­ming” on Twit­ter, and sev­en­teen of the eigh­teen hits are bots that sim­ply link to Ama­zon affil­i­ated books.

So any­way: just thinking.

If I had a col­lec­tion of $M$ play­ers and $N$ items, what strat­egy for link­ing to items would max­i­mize the score of an indi­vid­ual player? The aver­age score of the entire collection?

Later: Of course, I’ve done a thing I usu­ally yell at other peo­ple for doing. I have sev­eral goals in mind, and I may be con­flat­ing one or two of them.

Let me talk about it in terms of aggre­gated sys­tem design goals, rather than indi­vid­ual game dynam­ics for the “peo­ple” in my sketched model: We agree we want to max­i­mize the num­ber of links between peo­ple and things (in the two exam­ples I’ve men­tioned, and in oth­ers like maybe polit­i­cal engage­ment, or club mem­ber­ship). The goal of “con­fir­ma­tion” means we also want to max­i­mize the num­ber of peo­ple look­ing at each item, so that we have many eyes look­ing at each thing at once.

My fil­lip here is that I’m sug­gest­ing that we should simul­ta­ne­ously try to max­i­mize the diver­sity of sub­sets of peo­ple who have looked at items, to reduce cor­re­la­tion between people’s atten­tion to items as much as possible.