Empirically Evaluating Multiagent Learning Algorithms

There exist many algorithms for learning how to play repeated bimatrix games. Most of these algorithms are justified in terms of some sort of theoretical guarantee. On t he other hand, little is known about the empirical performance of these algorithms. Most such claims in the literature are been based on small experiments, which has hampered understanding as well as the development of new multiagent learning (MAL) algorithms. We have developed a new suite of tools for running multiagent experiments: the MultiAgent Learning Testbed (MALT). These tools are designed to facilitate larger and more comprehensive experiments by removing the need to build one-off experimental code. MALT also provides baseline implementations of many MAL algorithms, hopefully eliminating or reducing differences between algorithm implementations and increasing the reproducibility of results. Using this test suite, we ran an experiment unprecedented in size. We analyzed the results according to a variety of performance metrics including reward, maxmin distance, regret, and several notions of equilibrium conve rgence. We confirmed several pieces of conventional wisdom, but also discovered some surprising results. For example, we found that single-agent Q-learning outperformed many more complicated and more modern MAL algorithms.

[1]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[2]  Anatol Rapoport,et al.  The 2x2 Game , 1976 .

[3]  Philip Heidelberger,et al.  Quantile Estimation in Dependent Sequences , 1984, Oper. Res..

[4]  David M. Kreps,et al.  Learning Mixed Equilibria , 1993 .

[5]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[7]  L. Shapley,et al.  Fictitious Play Property for Games with Identical Interests , 1996 .

[8]  D. Monderer,et al.  A 2£ 2 Game without the Fictitious Play Property ⁄ , 1996 .

[9]  H. Kuk On equilibrium points in bimatrix games , 1996 .

[10]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[11]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[12]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[13]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[14]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[15]  Robert Axelrod,et al.  The Evolution of Strategies in the Iterated Prisoner's Dilemma , 2001 .

[16]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[17]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[18]  Robert Wilson,et al.  A global Newton method to compute Nash equilibria , 2003, J. Econ. Theory.

[19]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[20]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[21]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[22]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[23]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[24]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[25]  Bikramjit Banerjee,et al.  Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.

[26]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[27]  Yoav Shoham,et al.  Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Yoav Shoham,et al.  Learning against multiple opponents , 2006, AAMAS '06.

[30]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[31]  Bikramjit Banerjee,et al.  RVσ(t): a unifying approach to performance and convergence in online multiagent learning , 2006, AAMAS '06.

[32]  Andrew McLennan,et al.  Gambit: Software Tools for Game Theory , 2006 .

[33]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[34]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[35]  Tuomas Sandholm,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[36]  Sandip Sen,et al.  Evolutionary Tournament-Based Comparison of Learning and Non-Learning Algorithms for Iterated Games , 2007, J. Artif. Soc. Soc. Simul..

[37]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[38]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .