Another Journal Club article - this one discussed by Celia Heyes on Wednesday:
The tournament reported in this article (discussing the results of which was one of the focal points of the EHBEA meeting I attended in St Andrews last year) was modelled on Axelrod and Hamilton's famous
Iterated Prisoner's Dilemma tournament, which resulted in the identification of "tit for tat" as an evolutionarily stable strategy and the invigoration of theoretical and empirical work on the evolution of cooperation. But the present tournament's results are not so clear-cut. Reading through the article I got a vague sense that the design of the tournament game was overly complex (designed by committee, perhaps?). This was clarified in discussion at Journal Club.
There are only three moves available to agents in each round of the game: INNOVATE (picks a strategy at random - supposed to model individual learning), OBSERVE (picks a strategy that a neighbour has used - to model social learning) and EXPLOIT (uses a strategy and obtains a payoff). This seems simple enough - although I did wonder whether, since pitting individual against social learning is the main test under consideration, it might have been clearer to put EXPLOIT in a separate phase of each round (i.e. in each round, an agent either OBSERVEs, INNOVATEs, or perhaps does nothing, then always EXPLOITs); the actual design muddies the waters by also testing whether it is more efficient to spend time exploiting known strategies or observing/innovating new ones.
But Celia pointed out something that I had completely missed in the article and the conference discussion, and which was indeed buried deep in the Supplementary Material: the EXPLOIT move also updates the current payoff value of whichever strategy is chosen, so there is a form of individual learning built into this move as well as the INNOVATE move. Furthermore, the OBSERVE move does not always copy a strategy faithfully from a neighbour, but has a non-zero probability of returning some other strategy instead - so there seems to be an element of individual learning built into this move as well. Given that both these other moves contain a degree of individual learning, and that the INNOVATE move returns a strategy completely at random, it is not clear why any agent would ever want to use INNOVATE at all. This makes the main finding of the tournament - that the winning agents mostly used the EXPLOIT move, mixed with a bit of OBSERVE and hardly any INNOVATE - rather unsurprising.
Perhaps one way to make the INNOVATE strategy more attractive would have been to have added some directionality to it, rather than making it simply return a random number. In the real world, people do not innovate randomly from a blank slate, but try to improve on an existing strategy in their repertoire (which of course will often have a social origin). Of course, there is no guarantee that their efforts will results in an improvement, but the probability of improvement is something that could have been varied as a parameter - just as a way to give some advantage to using INNOVATE.
Anyway that's enough criticism from me! Kudos to the designers for putting together such a groundbreaking tournament. It's just a fact of life that many studies are more interesting for the mistakes they make (and hence, the learning opportunities they create) than for their actual findings. What I really want to do now is get into some modelling, and work out how the entrants' agents would have done if the rules of the game had been more like what I outline above (with EXPLOIT in a separate round and INNOVATE biased to pick a better strategy). I wonder if I could find out all the entrants' algorithms - maybe from Luke Rendell - must get in contact with him about that.