update: Clarified wording of “bad” and “good” players.
Consider the game RoShamBo (a.k.a. “rock-paper-scissors”, or “RPS”). Suppose we have created a population of deterministic iterated RPS players. Each player obeys a unique strategy, which is deterministic in the sense that whenever a player is in a particular situation, it will always play the same move as dictated by its personal strategy. Players have a finite memory of earlier rounds of play against their current opponent. They begin a game with an empty memory, and after each round they respond to the previous moves of both players. Their memories are of fixed length, though — when they play more than M rounds, they start to forget the oldest results. Thus, a player’s moves in a game are 100% determined by the sequence of moves made by each player.
A game between two players consists of T rounds. On each round, the stake is one quatloo; the winner will gain one, and the loser will lose one.
The strategy of a player with memory M can be described (over-described, frankly) by a set of: one deterministic initial move, nine moves they make in the second round in response to the nine possible outcomes of the first round, the 81 moves they can make in the third round in response to the two previous rounds, and so on up to the 9M moves they have in mind for the outcomes of the previous M rounds. Let’s assume that M≪ T.
Now there’s a lot of redundancy in this representation. But who cares — they’re deterministic! For example, imagine the strategy of a memory-4 player, the StupidScissorsMeister, who plays Scissors (S) on each round until his opponent plays Paper (P), and then switches to playing Rock (R) forevermore, until the end of a game. He doesn’t need all that memory, but just bear with me and give it to him.
First, contrary to what might be your intuition, it turns out that there are some really stupid strategies out there in the world. That is, strategies which tend to lose (obtain a negative score at the end of T rounds of play) more often than win (positive score) against all possible opponents of identical memory. Of course, since this is a zero-sum game, the average stake won over all possible pairs of opponents is exactly zero. But there are plenty of players that lose lose lose lose and then win big once in a blue moon.
Consider memory 0 players: they always make exactly one move (R, or P, or S) in every round. They all win 1/3 of the time, lose 1/3 of the time, and draw the rest of the time. The score at the end of a tournament leads to an exchange of either 0 or T quatloos from the dominating player to the dominated one.
So: describe a really stupid strategy of memory 5, which will lose most often in games of 113 rounds, when paired against all other strategies of memory 5. Are there equivalent really smart strategies of memory 5, which win more often?
The worst players are pretty damned sorry, aren’t they? Poor schmucks. Let’s spice the pot a bit, shall we? Give them a fighting chance.
Second. Let’s give the players one more behavior. They can raise the ante on any round. On each round, each player pays attention to the RPS moves they and their opponent have made, but in addition to choosing the next RPS move, each player also may utter a cry of “raise!” on each round. The initial stake at the beginning of the game is one quatloo; on any round where one player cries “raise!”, they are forcing both players to add an extra quatloo to the pot; on any round where both players cry “raise!”, the pot is doubled. Stakes never revert to a lower level during a game; once a raise takes place, it remains in place for the remaining rounds.
Note that the moves a player makes do not depend on the size of the pot; they can’t actually tell what’s at stake. Under these circumstances, can you say whether all the poor players remain poor?
Is there a “raise!” strategy that can compensate the poor players, so that they tend to make more than they would, on average, in the basic game?
Third, suppose the players can watch the size of the kitty. That is, they have a separate strategy for any given level of stakes. They both start with stake-1 strategy, and when it has reached 2 quatloos, they both switch to their stake-2 strategy; when it’s 189 quatloos, they both use stake-189 strategies. Still, as you might imagine, deterministically.
What then?
Finally: It is informative to compare the dynamics of these players in the aggregate (all possible strategies of memory M in T round games), as opposed to a kind of evolutionary situation. Suppose for each of the three games outlined above, there is a population of 1000 players, initially chosen at random. Each plays a tournament against all the rest. At the end of a single all-vs-all tournament, the 900 players with the highest scores are retained for the next tournament, but the lowest-scoring 100 are replaced with 100 fresh players. We play many, many, many tournaments.
What will happen if the replacements are randomly-chosen new strategies in each tournament? What will happen if they’re duplicates of the 100 highest-scorers? What if they’re duplicates of the 100 worst-scoring (retained) players, those ranked 801-900 in the last tournament? What if they’re ten duplicates each of the ten worst (retained) players?
Will these variations stabilize to fixed strategies in fixed proportions? Are there situations where poor strategies come to hold an advantage in the context of the current population? Is there a trend in the average scores of the players in the game, over many tournaments?
Bonus questions: Create a graph where you connect every possible pairing of different players with an edge, and label every edge with the net flow of quatloos in a tournament of T rounds. While there is no net flow in the system, even when stakes-increasing can occur, what does this “economy” look like, under the various all-vs-all situations?
What does the economy graph look like over time? What is the asymptotic structure, as T grows very large?
What does the economy graph look like in a finite population of players? As the contents of the population change?
What happens if players start with an initial wealth of 1000 quatloos, and their wealth cannot drop below 0? That is, if a round won against a bankrupt player results in no exchange of funds.

