Non�?Parametric Bayesian Inference of Strategies in Repeated Games

Inferring underlying cooperative and competitive strategies from human behaviour in repeated games is important for accurately characterizing human behaviour and understanding how people reason strategically. Finite automata, a bounded model of computation, have been extensively used to compactly represent strategies for these games and are a standard tool in game theoretic analyses. However, inference over these strategies in repeated games is challenging since the number of possible strategies grows exponentially with the number of repetitions yet behavioural data are often sparse and noisy. As a result, previous approaches start by specifying a finite hypothesis space of automata that does not allow for flexibility. This limitation hinders the discovery of novel strategies that may be used by humans but are not anticipated a priori by current theory. Here we present a new probabilistic model for strategy inference in repeated games by exploiting non‐parametric Bayesian modelling. With simulated data, we show that the model is effective at inferring the true strategy rapidly and from limited data, which leads to accurate predictions of future behaviour. When applied to experimental data of human behaviour in a repeated prisoner's dilemma, we uncover strategies of varying complexity and diversity.

[1]  Drew Fudenberg,et al.  The Folk Theorem in Repeated Games with Discounting or with Incomplete Information , 1986 .

[2]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[3]  M. Nowak,et al.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game , 1993, Nature.

[4]  Frank D. Wood,et al.  The sequence memoizer , 2011, Commun. ACM.

[5]  Yves Breitmoser Cooperation, but no reciprocity: Individual strategies in the repeated Prisoner's Dilemma , 2015 .

[6]  Edith Elkind,et al.  Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems , 2015, AAMAS 2015.

[7]  G. Spagnolo,et al.  Equilibrium Selection in the Repeated Prisoner's Dilemma: Axiomatic Approach and Experimental Evidence , 2011 .

[8]  M. Nowak,et al.  Tit for tat in heterogeneous populations , 1992, Nature.

[9]  Guillaume Fréchette,et al.  The Evolution of Cooperation in Infinitely Repeated Games: Experimental Evidence , 2011 .

[10]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[11]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[12]  David Carmel,et al.  Learning Models of Intelligent Agents , 1996, AAAI/IAAI, Vol. 1.

[13]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[14]  Joshua B. Tenenbaum,et al.  Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[17]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[18]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[19]  Peter Stone,et al.  A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[20]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[21]  André Kempe Finite state transducers approximating Hidden Markov Models , 1997 .

[22]  Piotr J. Gmytrasiewicz,et al.  Nonparametric Bayesian Learning of Other Agents? Policies in Interactive POMDPs , 2015, AAMAS.

[23]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[24]  P. Bó Cooperation under the Shadow of the Future: Experimental Evidence from Infinitely Repeated Games , 2005 .

[25]  A. Rubinstein Finite automata play the repeated prisoner's dilemma , 1986 .

[26]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[27]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[28]  Krishnendu Chatterjee,et al.  Forgiver Triumphs in Alternating Prisoner's Dilemma , 2013, PloS one.

[29]  John C. Trueswell,et al.  Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 432-437) Cognitive Science Society. , 2016 .

[30]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.