Agendas for multi-agent learning

Shoham et al. identify several important agendas which can help direct research in multi-agent learning. We propose two additional agendas-called ''modelling'' and ''design''-which cover the problems we need to consider before our agents can start learning. We then consider research goals for modelling, design, and learning, and identify the problem of finding learning algorithms that guarantee convergence to Pareto-dominant equilibria against a wide range of opponents. Finally, we conclude with an example: starting from an informally-specified multi-agent learning problem, we illustrate how one might formalize and solve it by stepping through the tasks of modelling, design, and learning.

[1]  Jeff G. Schneider,et al.  Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[2]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[3]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[4]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[5]  Imre Bárány,et al.  Fair Distribution Protocols or How the Players Replace Fortune , 1992, Math. Oper. Res..

[6]  Sergei Izmalkov,et al.  Rational secure computation and ideal mechanism design , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[7]  Manuela M. Veloso,et al.  Simultaneous Adversarial Multi-Robot Learning , 2003, IJCAI.

[8]  Brian Knight,et al.  Reified Temporal Logics: An Overview , 2001, Artificial Intelligence Review.

[9]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[10]  No-Regret Algorithms for Structured Prediction Problems , 2005 .

[11]  Geoffrey J. Gordon No-regret Algorithms for Online Convex Programs , 2006, NIPS.

[12]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[13]  David M. Kreps,et al.  Game Theory and Economic Modelling , 1992 .

[14]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[15]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[16]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[17]  Geoffrey J. Gordon,et al.  Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games , 2006, NIPS.

[18]  Geoffrey J. Gordon,et al.  Multi-robot coordination and competition using mixed integer and linear programs , 2004 .

[19]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[20]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[21]  Ronen I. Brafman,et al.  Efficient learning equilibrium , 2004, Artificial Intelligence.

[22]  A. Rubinstein Perfect Equilibrium in a Bargaining Model , 1982 .

[23]  Michael L. Littman,et al.  Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.

[24]  J. Nash THE BARGAINING PROBLEM , 1950, Classics in Game Theory.

[25]  Peter Stone,et al.  TacTex-05: A Champion Supply Chain Management Agent , 2006, AAAI.

[26]  Shai Halevi,et al.  A Cryptographic Solution to a Game Theoretic Problem , 2000, CRYPTO.

[27]  Peter Stone,et al.  ATTac-2000: an adaptive autonomous bidding agent , 2001, AGENTS '01.

[28]  Adam Tauman Kalai,et al.  Geometric algorithms for online optimization , 2002 .

[29]  Jonathan Schaeffer,et al.  Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[30]  Michael P. Wellman,et al.  Walverine: a Walrasian trading agent , 2003, AAMAS '03.

[31]  S. Hart,et al.  A General Class of Adaptive Strategies , 1999 .

[32]  S. Hart,et al.  Long Cheap Talk , 2003 .

[33]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .