This is just a telegraphic note to get this out of my head and onto paper quickly.
I’m thinking about crowdsourced proofreading, or recommender systems, or any of several other problems in bipartite network theory (to get all nerdy and crap). In general, suppose there are a collection people, and each of them is looking at some subset of objects.
If they’re proofreaders looking at transcribed pictures of pages and checking for errors, if person checks and signs off a page , then we connect node to with an edge.
If they’re web surfers looking at sites and tweets and shit and checking for awesomeness, if person visits and recommends a URI , then we connect node to with an edge.
And so on.
We all know a lot of different ways of scoring the things these people are looking at, don’t we Nerdy? PageRank, a bunch of network theory things, &c &c.
Me, I’m musing about something slightly different. I think. I’m thinking about confirmation, and collaboration, and how the step from one to two people vouching for something is “worth” more than the step from 17 to 18 people vouching for it.
Let’s score things this way:
- For each person, observe the set of objects to which they’re linked. Suppose for example that Ada linked to items #1, #2 and #3, Byron linked to item #2, and Charles linked to item #2.
- Similarly, for each item, collect the set of people who are linked to it. In our example, item #1 is linked from Ada, item #2 from Ada, Byron and Charles, and item #3 from Ada.
- The score of a person is increased by 1 for every item to which they link, to which a unique subset of people link. In our example, Ada links to item #1 (worth 1 point), item #2 (worth another point, because ), and item #3 (not worth more points, since item #1 is linked from the same subset of people). Byron’s score is 1 because he links to item #2; so is Charles’s.
- The score of an item is [some statistic of] the scores of the people linked to it. Maybe average, maybe sum; doesn’t really matter to me just now.
The interesting thing to me at the moment is understanding the dynamics of discovery as a game here. Ada hasn’t got any way to increase her score without discovering some item #4. If Byron links to items #1 or #3, he increases his score, and also Ada’s score indirectly, because she will then link to three items with different “audiences”. But if Charles follows suit, and links to the same item Byron does, that shared advantage disappears for both of them. If everybody links to everything, the scores are eroded away to 1 across the board.
So what problem am I trying to solve here?
In the antiquated crowdsourcing system of Distributed Proofreaders, there have been numerous serious issues with quality and diversity through the years. Originally proofreaders’ “scores” were just the number of pages they read and signed off as being “correct”; this of course led to click-through gaming that didn’t actually improve the quality of texts. But there’s still a lot of money on the table there, since I often see books in which a dozen consecutive pages have been proofread by the same pair of people.
This, to me, is a concern. Because of Scott Page’s work, among other things.
There’s a Goldilocks point here. If everybody does non-overlapping work, there is no community, and no chance for checking one another’s work. If everybody does everything, there’s a lot of redundancy in the work, and returns diminish quickly. But somewhere in between is a problem of sub-community structure: if two people consistently apply themselves to the same work, there must be an accompanying diminishment in their joint contribution.
What I’m trying to do here is reward coverage.
Similarly, I’m concerned with the sustainable quality of links generated by crowdsourced systems like delicious.com or pinboard.in—or even Google PageRank—as bots spew random-seeming links across the network. At the moment I have a search window open for “genetic programming” on Twitter, and seventeen of the eighteen hits are bots that simply link to Amazon affiliated books.
So anyway: just thinking.
If I had a collection of players and items, what strategy for linking to items would maximize the score of an individual player? The average score of the entire collection?
Later: Of course, I’ve done a thing I usually yell at other people for doing. I have several goals in mind, and I may be conflating one or two of them.
Let me talk about it in terms of aggregated system design goals, rather than individual game dynamics for the “people” in my sketched model: We agree we want to maximize the number of links between people and things (in the two examples I’ve mentioned, and in others like maybe political engagement, or club membership). The goal of “confirmation” means we also want to maximize the number of people looking at each item, so that we have many eyes looking at each thing at once.
My fillip here is that I’m suggesting that we should simultaneously try to maximize the diversity of subsets of people who have looked at items, to reduce correlation between people’s attention to items as much as possible.