论文信息 - Learning against opponents with bounded memory

Learning against opponents with bounded memory

Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory. We then show an algorithm that provably achieves an o-best response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in self-play. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments.

Yoav Shoham | Rob Powers | Y. Shoham | Rob Powers

[1] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[2] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .

[3] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[4] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[5] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[6] Gunes Ercal,et al. On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[7] E. Kalai,et al. Rational Learning Leads to Nash Equilibrium , 1993 .

[8] Ivana Kruijff-Korbayová,et al. A Portfolio Approach to Algorithm Selection , 2003, IJCAI.

[9] Thomas G. Dietterich,et al. In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[10] Illah R. Nourbakhsh,et al. Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots , 2000, ICML.

[11] Yoav Shoham,et al. Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[12] Mihalis Yannakakis,et al. On complexity as bounded rationality (extended abstract) , 1994, STOC '94.

[13] W. Hamilton,et al. The Evolution of Cooperation , 1984 .

[14] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[15] D. Fudenberg,et al. Conditional Universal Consistency , 1999 .

[16] Leslie Pack Kaelbling,et al. Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[17] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[18] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[19] Yoav Shoham,et al. New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[20] Zoubin Ghahramani,et al. Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[21] Peter Stone,et al. Implicit Negotiation in Repeated Games , 2001, ATAL.

[22] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[23] A. Neyman. Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .