论文信息 - Learning against multiple opponents

Learning against multiple opponents

We address the problem of learning in repeated n-player (as opposed to 2-player) general-sum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learning algorithm work together to achieve a joint best-response against a target class of opponents, while guaranteeing they each achieve at least their individual security-level payoff against any possible set of opponents. We then provide algorithms that provably meet these criteria for two target classes: stationary strategies and adaptive strategies with a bounded memory. We also demonstrate that the algorithm for stationary strategies outperforms existing algorithms in tests spanning a wide variety of repeated games with more than two players.

[1] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[2] Yoav Shoham,et al. New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[3] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[4] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[5] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .

[6] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[7] Manuela M. Veloso,et al. Existence of Multiagent Equilibria with Limited Agents , 2004, J. Artif. Intell. Res..

[8] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[9] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[10] Gunes Ercal,et al. On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[11] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[12] Yoav Shoham,et al. Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[13] Peter Stone,et al. Implicit Negotiation in Repeated Games , 2001, ATAL.

[14] Yoav Shoham,et al. Learning against opponents with bounded memory , 2005, IJCAI.

[15] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[16] Daniel Kudenko,et al. Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[17] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .

[18] Nimrod Megiddo,et al. How to Combine Expert (and Novice) Advice when Actions Impact the Environment? , 2003, NIPS.

[19] Mihalis Yannakakis,et al. On complexity as bounded rationality (extended abstract) , 1994, STOC '94.

[20] J. M. Bilbao,et al. Contributions to the Theory of Games , 2005 .

[21] A. Neyman. Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[22] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[23] E. Kalai,et al. Rational Learning Leads to Nash Equilibrium , 1993 .

[24] Ronen I. Brafman,et al. Efficient learning equilibrium , 2004, Artificial Intelligence.

[25] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[26] Xiaofeng Wang,et al. Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[27] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.