Multi-Agent Reinforcement Learning:a critical survey

We survey the recent work in AI on multi-agent reinforcement learning (that is, learning in stochastic games). We then argue that, while exciting, this work is flawed. The fundamental flaw is unclarity about the problem or problems being addressed. After tracing a representative sample of the recent literature, we identify four well-defined problems in multi-agent reinforcement learning, single out the problem that in our view is most suitable for AI, and make some remarks about how we believe progress is to be made on this problem.

[1]  T. Koopmans,et al.  Activity Analysis of Production and Allocation. , 1952 .

[2]  R. Allen Economic Theory , 1958, Nature.

[3]  S. Zamir,et al.  Formulation of Bayesian analysis for games with incomplete information , 1985 .

[4]  A. Neyman Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[5]  Editors , 1986, Brain Research Bulletin.

[6]  Edmund H. Durfee,et al.  A decision-theoretic approach to coordinating multiagent interactions , 1991, IJCAI 1991.

[7]  David M. Kreps,et al.  Learning Mixed Equilibria , 1993 .

[8]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[9]  Mihalis Yannakakis,et al.  On complexity as bounded rationality (extended abstract) , 1994, STOC '94.

[10]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[12]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[13]  John Nachbar,et al.  Non-computable strategies and discounted repeated games , 1996 .

[14]  A. Rubinstein Modeling Bounded Rationality , 1998 .

[15]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[16]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[17]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[18]  T. Ishida,et al.  A Trading Agent Competition for the Research Community , 1999 .

[19]  Sethu Vijayakumar,et al.  ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning , 2000 .

[20]  Colin Camerer,et al.  Sophisticated EWA Learning and Strategic Teaching in Repeated Games , 2000 .

[21]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[22]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[23]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[24]  Michael P. Wellman,et al.  Learning about other agents in a dynamic multiagent system , 2001, Cognitive Systems Research.

[25]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[26]  Leslie Pack Kaelbling,et al.  Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[27]  Ronen I. Brafman,et al.  Efficient learning equilibrium , 2004, Artif. Intell..

[28]  Amy Greenwald,et al.  Correlated Q-Learning , 2003, ICML.

[29]  Peter Dayan,et al.  Technical Note: Q-Learning , 1992, Machine Learning.

[30]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[31]  Dov Samet,et al.  Learning to play games in extensive form by valuation , 2005, J. Econ. Theory.

[32]  J. Bilbao,et al.  Contributions to the Theory of Games , 2005 .