A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games

A central task of artificial intelligence is the design of artificial agents that act towards specified goals in partially observed environments. Since such environments frequently include interaction over time with other agents with their own goals, reasoning about such interaction relies on sequential game-theoretic models such as extensive-form games or some of their succinct representations such as multi-agent influence diagrams. The current algorithms for calculating equilibria either work with inefficient representations, possibly doubly exponential in the number of time steps, or place strong assumptions on the game structure. In this paper, we propose a sampling-based approach, which calculates extensive-form correlated equilibria with small representations without placing such strong assumptions. Thus, it is practical in situations where the previous approaches would fail. In addition, our algorithm allows control over characteristics of the target equilibrium, e.g., we can ask for an equilibrium with high social welfare. Our approach is based on a multiplicative-weight update algorithm analogous to AdaBoost, and Markov chain Monte Carlo sampling. We prove convergence guarantees and explore the utility of our approach on several moderately sized multi-player games.

[1]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .

[2]  M. Spence Job Market Signaling , 1973 .

[3]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[4]  D. Koller,et al.  The complexity of two-person zero-sum games in extensive form , 1992 .

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Bernhard von Stengel,et al.  Computing Normal Form Perfect Equilibria for Extensive Two-Person Games , 2002 .

[7]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[10]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[11]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[12]  Geoffrey E. Hinton,et al.  Self Supervised Boosting , 2002, NIPS.

[13]  B. Stengel,et al.  Computationally efficient coordination in game trees , 2002 .

[14]  Daphne Koller,et al.  Multi-agent algorithms for solving graphical games , 2002, AAAI/IAAI.

[15]  Daphne Koller,et al.  A Continuation Method for Nash Equilibria in Structured Games , 2003, IJCAI.

[16]  Vincent Conitzer,et al.  Complexity Results about Nash Equilibria , 2002, IJCAI.

[17]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[18]  Christos H. Papadimitriou,et al.  Computing correlated equilibria in multi-player games , 2005, STOC '05.

[19]  Tuomas Sandholm,et al.  Finding equilibria in large sequential games of imperfect information , 2006, EC '06.

[20]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[21]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[22]  Robert E. Schapire,et al.  Faster solutions of the inverse pairwise Ising problem , 2008 .