Using a theory of mind to find best responses to memory-one strategies

Memory-one strategies are a set of Iterated Prisoner’s Dilemma strategies that have been praised for their mathematical tractability and performance against single opponents. This manuscript investigates best response memory-one strategies with a theory of mind for their opponents. The results add to the literature that has shown that extortionate play is not always optimal by showing that optimal play is often not extortionate. They also provide evidence that memory-one strategies suffer from their limited memory in multi agent interactions and can be out performed by optimised strategies with longer memory. We have developed a theory that has allowed to explore the entire space of memory-one strategies. The framework presented is suitable to study memory-one strategies in the Prisoner’s Dilemma, but also in evolutionary processes such as the Moran process. Furthermore, results on the stability of defection in populations of memory-one strategies are also obtained.

[1]  Francisco C. Santos,et al.  Intention recognition promotes the emergence of cooperation , 2011, Adapt. Behav..

[2]  M. Nowak,et al.  Adaptive Dynamics of Extortion and Compliance , 2013, PloS one.

[3]  M. Nowak,et al.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game , 1993, Nature.

[4]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[5]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[6]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[7]  Stephen A. Vavasis,et al.  Accurate solution of polynomial equations using Macaulay resultant matrices , 2004, Math. Comput..

[8]  Alexander J. Stewart,et al.  Extortion and cooperation in the Prisoner’s Dilemma , 2012, Proceedings of the National Academy of Sciences.

[9]  J. Daunizeau,et al.  Theory of Mind: Did Evolution Fool Us? , 2014, PloS one.

[10]  M. M. Flood Some Experimental Games , 1958 .

[11]  M. Nowak,et al.  The evolution of stochastic strategies in the Prisoner's Dilemma , 1990 .

[12]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[13]  Martin Jones,et al.  Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma , 2017, PloS one.

[14]  Andy R. Terrel,et al.  SymPy: Symbolic computing in Python , 2017, PeerJ Prepr..

[15]  Jonathan Gillard,et al.  Recognising and evaluating the effectiveness of extortion in the Iterated Prisoner's Dilemma , 2019, ArXiv.

[16]  Marc Harper,et al.  The Art of War: Beyond Memory-one Strategies in Population Games , 2014, PLoS ONE.

[17]  R. Boyd Mistakes allow evolutionary stability in the repeated prisoner's dilemma game. , 1989, Journal of theoretical biology.

[18]  Arne Traulsen,et al.  Partners or rivals? Strategies for the iterated prisoner's dilemma☆ , 2015, Games Econ. Behav..

[19]  M. Nowak,et al.  Partners and rivals in direct reciprocity , 2018, Nature Human Behaviour.

[20]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[21]  R. Axelrod,et al.  How to Cope with Noise in the Iterated Prisoner's Dilemma , 1995 .

[22]  Nicolas P. Rougier,et al.  Re-run, Repeat, Reproduce, Reuse, Replicate: Transforming Code into Scientific Contributions , 2017, Front. Neuroinform..

[23]  C. Adami,et al.  Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything , 2012, Nature Communications.

[24]  Bart Verheij,et al.  How much does it help to know what she knows you know? An agent-based simulation study , 2013, Artif. Intell..

[25]  D. Fudenberg,et al.  Tit-for-tat or win-stay, lose-shift? , 2007, Journal of theoretical biology.

[26]  W. Press,et al.  Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent , 2012, Proceedings of the National Academy of Sciences.

[27]  Martin A Nowak,et al.  Evolution of extortion in Iterated Prisoner’s Dilemma games , 2012, Proceedings of the National Academy of Sciences.

[28]  Vincent A. Knight,et al.  Evolution reinforces cooperation with the emergence of self-recognition mechanisms: An empirical study of strategies in the Moran process for the iterated prisoner’s dilemma , 2017, PloS one.

[29]  Francisco C. Santos,et al.  Corpus-Based Intention Recognition in Cooperation Dilemmas , 2012, Artificial Life.

[30]  M. Nowak Evolutionary Dynamics: Exploring the Equations of Life , 2006 .

[31]  Martin A. Nowak,et al.  Game-dynamical aspects of the prisoner's dilemma , 1989 .