Inference-based Decision Making in Games

Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds applied to trees (UCT) and counterfactual regret minimization were developed and proved to be very successful, too. Meanwhile remarkably simple algorithms based on likelihood maximization where found for planning in Markov decision processes, which opened up room for new research. Applying these new methods to extensive games is the focus of this thesis. Results: We describe a generic schema for transforming an extensive game into a multi-agent partially observable Markov decision process (POMDP), derive a strategy update based on the EM algorithm and give an implementation using the hidden Markov model. Tests on a number of minimalistic games suggest that for the two-player case equilibrium strategies are found if the game has pure Nash equilibria but otherwise only the average payoffs of the two players converge to their respective values of a mixed Nash equilibrium, i.e. no equilibrium strategies are found. Further investigation showed that the algorithmic framework is general enough to facilitate the replacement of the M-step by other update procedures such as the polynomial weights algorithm (resulting in external regret minimization) or the counterfactual regret minimization method. Using the latter update, the strategies do converge. Eidesstattliche Erklarung Ich versichere hiermit an Eides Statt, dass diese Arbeit von niemand anderem als meiner Person verfasst worden ist. Alle verwendeten Hilfsmittel wie Berichte, Bucher, Internetseiten oder ahnliches sind im Literaturverzeichnis angegeben, Zitate aus fremden Arbeiten sind als solche kenntlich gemacht. Die Arbeit wurde bisher in gleicher oder ahnlicher Form keiner anderen Prufungskommission vorgelegt und auch nicht veroffentlicht. Berlin, 7. Juni 2011

[1]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[2]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[3]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[4]  A. J. Lotka Elements of Physical Biology. , 1925, Nature.

[5]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[6]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[9]  Dock Bumpers,et al.  Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[10]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[11]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[12]  Eitan Zemel,et al.  Nash and correlated equilibria: Some complexity considerations , 1989 .

[13]  M. Wells,et al.  Variations and Fluctuations of the Number of Individuals in Animal Species living together , 2006 .

[14]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[15]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[16]  R. J. Joenk,et al.  IBM journal of research and development: information for authors , 1978 .

[17]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[18]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[19]  R. Sutton,et al.  Reinforcement learning in board games , 2004 .

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  Marc Toussaint,et al.  Probabilistic inference for computing optimal policies in MDPs , 2005 .

[22]  James R. Slagle,et al.  Experiments With Some Programs That Search Game Trees , 1969, JACM.

[23]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[24]  Paul W. Goldberg,et al.  Reducibility among equilibrium problems , 2006, STOC '06.

[25]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[26]  R. Sugden,et al.  Regret Theory: An alternative theory of rational choice under uncertainty Review of Economic Studies , 1982 .

[27]  R. Bellman A Markovian Decision Process , 1957 .

[28]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[29]  Christos H. Papadimitriou,et al.  On the Complexity of the Parity Argument and Other Inefficient Proofs of Existence , 1994, J. Comput. Syst. Sci..

[30]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[31]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[32]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[33]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[34]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[35]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[36]  W. Browder,et al.  Annals of Mathematics , 1889 .

[37]  Sylvain Gelly,et al.  Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[38]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[39]  A. J. Lotka,et al.  Elements of Physical Biology. , 1925, Nature.

[40]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[41]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[42]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[43]  Marc Toussaint,et al.  Scalable Multiagent Planning Using Probabilistic Inference , 2011, IJCAI.

[44]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[45]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[46]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[47]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .