Learning against multiple opponents

We address the problem of learning in repeated n-player (as opposed to 2-player) general-sum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learning algorithm work together to achieve a joint best-response against a target class of opponents, while guaranteeing they each achieve at least their individual security-level payoff against any possible set of opponents. We then provide algorithms that provably meet these criteria for two target classes: stationary strategies and adaptive strategies with a bounded memory. We also demonstrate that the algorithm for stationary strategies outperforms existing algorithms in tests spanning a wide variety of repeated games with more than two players.

[1]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[2]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[3]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[4]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[5]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[6]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[7]  Manuela M. Veloso,et al.  Existence of Multiagent Equilibria with Limited Agents , 2004, J. Artif. Intell. Res..

[8]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[11]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[12]  Yoav Shoham,et al.  Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[13]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[14]  Yoav Shoham,et al.  Learning against opponents with bounded memory , 2005, IJCAI.

[15]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[16]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[17]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[18]  Nimrod Megiddo,et al.  How to Combine Expert (and Novice) Advice when Actions Impact the Environment? , 2003, NIPS.

[19]  Mihalis Yannakakis,et al.  On complexity as bounded rationality (extended abstract) , 1994, STOC '94.

[20]  J. M. Bilbao,et al.  Contributions to the Theory of Games , 2005 .

[21]  A. Neyman Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[22]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[23]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[24]  Ronen I. Brafman,et al.  Efficient learning equilibrium , 2004, Artificial Intelligence.

[25]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[26]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[27]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.