Solving two-person zero-sum repeated games of incomplete information

In repeated games with incomplete information, rational agents must carefully weigh the tradeoffs of advantageously exploiting their information to achieve a short-term gain versus carefully concealing their information so as not to give up a long-term informed advantage. The theory of infinitely-repeated two-player zero-sum games with incomplete information has been carefully studied, beginning with the seminal work of Aumann and Maschler. While this theoretical work has produced a characterization of optimal strategies, algorithms for solving for optimal strategies have not yet been studied. For the case where one player is informed about the true state of the world and the other player is uninformed, we provide a non-convex mathematical programming formulation for computing the value of the game, as well as optimal strategies for the informed player. We then describe an efficient algorithm for solving this difficult optimization problem to within arbitrary accuracy. We also discuss how to efficiently compute optimal strategies for the uninformed player using the output of our algorithm.

[1]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[2]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[3]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4]  Peter Stone,et al.  A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[5]  Dinah Rosenberg,et al.  "Cav u" and the Dual Game , 1999, Math. Oper. Res..

[6]  Robert J. Aumann,et al.  Repeated Games with Incomplete Information , 1995 .

[7]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[8]  C. Carathéodory Über den variabilitätsbereich der fourier’schen konstanten von positiven harmonischen funktionen , 1911 .

[9]  Jean-François Mertens,et al.  The value of two-person zero-sum repeated games with lack of information on both sides , 1971 .

[10]  S. Sorin A First Course on Zero Sum Repeated Games , 2002 .

[11]  Sergiu Hart,et al.  Nonzero-Sum Two-Person Repeated Games with Incomplete Information , 1985, Math. Oper. Res..

[12]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[13]  Bernard De Meyer,et al.  Repeated Games, Duality and the Central Limit Theorem , 1996, Math. Oper. Res..

[14]  S. Sorin,et al.  The LP formulation of finite zero-sum games with incomplete information , 1980 .

[15]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[16]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[17]  Ronen I. Brafman,et al.  A near-optimal polynomial time algorithm for learning in certain classes of stochastic games , 2000, Artif. Intell..

[18]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[19]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[20]  Tuomas Sandholm,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[21]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[22]  Antonio Guarnieri,et al.  WITH THE COLLABORATION OF , 2009 .

[23]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[24]  Kem Knapp Sawyer The U.S. arms control and disarmament agency , 1990 .

[25]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[26]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27]  Shmuel Zamir,et al.  Repeated games of incomplete information: Zero-sum , 1992 .

[28]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .