Automated equilibrium analysis of repeated games with private monitoring: a POMDP approach

The present paper investigates repeated games with imperfect private monitoring, where each player privately receives a noisy observation (signal) of the opponent’s action. Such games have been paid considerable attention in the AI and economics literature. Identifying pure strategy equilibria in this class has been known as a hard open problem. Recently, we showed that the theory of partially observable Markov decision processes (POMDP) can be applied to identify a class of equilibria where the equilibrium behavior can be described by a nite state automaton (FSA). However, they did not provide a practical method or a program to apply their general idea to actual problems. We rst develop a program that acts as a wrapper of a standard POMDP solver, which takes a description of a repeated game with private monitoring and an FSA as inputs, and automatically checks whether the FSA constitutes a symmetric equilibrium. We apply our program to repeated Prisoner’s dilemma and nd a novel class of FSA, which we call k-period mutual punishment (k-MP). The k-MP starts with cooperation and defects after observing a defection. It restores cooperation after observing defections k-times in a row. Our program enables us to exhaustively search for all FSAs with at most three states, and we found that 2-MP beats all the other pure strategy equilibria with at most three states for some range of parameter values and it is more ecient in an equilibrium than the grim-trigger.

[1]  Winston Khoon Guan Seah,et al.  Game-Theoretic Model for Collaborative Protocols in Selfish, Tariff-Free, Multihop Wireless Networks , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[2]  Moshe Tennenholtz,et al.  Learning equilibria in repeated congestion games , 2009, AAMAS.

[3]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[4]  Gerd Brewka,et al.  Artificial intelligence - a modern approach by Stuart Russell and Peter Norvig, Prentice Hall. Series in Artificial Intelligence, Englewood Cliffs, NJ , 1996, The Knowledge Engineering Review.

[5]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[6]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[7]  Glenn Ellison Cooperation in the Prisoner's Dilemma with Anonymous Random Matching , 1994 .

[8]  A. Rapoport,et al.  Prisoner's Dilemma , 1965 .

[9]  M. Nowak,et al.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game , 1993, Nature.

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Mainak Chatterjee,et al.  Cooperation in Ad Hoc Networks with Noisy Channels , 2009, 2009 6th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks.

[12]  Andrzej Skrzypacz,et al.  Beliefs and Private Monitoring , 2012 .

[13]  Microeconomics-Charles W. Upton Repeated games , 2020, Game Theory.

[14]  Prashant Doshi,et al.  On the Difficulty of Achieving Equilibrium in Interactive POMDPs , 2006, AI&M.

[15]  G. Mailath,et al.  Repeated Games and Reputations , 2006 .

[16]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[17]  D. Kraines,et al.  Pavlov and the prisoner's dilemma , 1989 .

[18]  I. Obara Towards a Belief-Based Theory of Repeated Games with Private Monitoring : An Application of POMDP ∗ KANDORI , Michihiro , 2010 .

[19]  Sarvapali D. Ramchurn,et al.  Coordinating team players within a noisy Iterated Prisoner's Dilemma tournament , 2007, Theor. Comput. Sci..

[20]  Makoto Yokoo,et al.  Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.