New Criteria and a New Algorithm for Learning in Multi-Agent Systems

We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm's payoff at least approach (and possibly exceed) the security level payoff (or max-imin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms.

[1]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[2]  W. Hoeffding On the Distribution of the Number of Successes in Independent Trials , 1956 .

[3]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[4]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[5]  S. Hart,et al.  A Simple Adaptive Procedure Leading to Correlated Equilibrium , 1997 .

[6]  Regret in the On-line Decision , 1997 .

[7]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[8]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[9]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[10]  Conditional Universal Consistency , 1999 .

[11]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[12]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[13]  Leonid Sheremetov,et al.  Weiss, Gerhard. Multiagent Systems a Modern Approach to Distributed Artificial Intelligence , 2009 .

[14]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[15]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[16]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[17]  Ronen I. Brafman,et al.  Efficient learning equilibrium , 2004, Artif. Intell..

[18]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[19]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[20]  Peter Dayan,et al.  Technical Note: Q-Learning , 1992, Machine Learning.

[21]  Yoav Shoham,et al.  Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[22]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.