论文信息 - A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games

A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games

A central task of artificial intelligence is the design of artificial agents that act towards specified goals in partially observed environments. Since such environments frequently include interaction over time with other agents with their own goals, reasoning about such interaction relies on sequential game-theoretic models such as extensive-form games or some of their succinct representations such as multi-agent influence diagrams. The current algorithms for calculating equilibria either work with inefficient representations, possibly doubly exponential in the number of time steps, or place strong assumptions on the game structure. In this paper, we propose a sampling-based approach, which calculates extensive-form correlated equilibria with small representations without placing such strong assumptions. Thus, it is practical in situations where the previous approaches would fail. In addition, our algorithm allows control over characteristics of the target equilibrium, e.g., we can ask for an equilibrium with high social welfare. Our approach is based on a multiplicative-weight update algorithm analogous to AdaBoost, and Markov chain Monte Carlo sampling. We prove convergence guarantees and explore the utility of our approach on several moderately sized multi-player games.

Miroslav Dudík | Geoffrey J. Gordon | Miroslav Dudík

[1] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .

[2] M. Spence. Job Market Signaling , 1973 .

[3] R. Aumann. Subjectivity and Correlation in Randomized Strategies , 1974 .

[4] D. Koller,et al. The complexity of two-person zero-sum games in extensive form , 1992 .

[5] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6] Bernhard von Stengel,et al. Computing Normal Form Perfect Equilibria for Extensive Two-Person Games , 2002 .

[7] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[8] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[10] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[11] Daphne Koller,et al. Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.