Learning against opponents with bounded memory

Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory. We then show an algorithm that provably achieves an o-best response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in self-play. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments.

[1]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[2]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[3]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[4]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[7]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[8]  Ivana Kruijff-Korbayová,et al.  A Portfolio Approach to Algorithm Selection , 2003, IJCAI.

[9]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[10]  Illah R. Nourbakhsh,et al.  Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots , 2000, ICML.

[11]  Yoav Shoham,et al.  Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[12]  Mihalis Yannakakis,et al.  On complexity as bounded rationality (extended abstract) , 1994, STOC '94.

[13]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[14]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[15]  D. Fudenberg,et al.  Conditional Universal Consistency , 1999 .

[16]  Leslie Pack Kaelbling,et al.  Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[17]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[18]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[19]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[20]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[21]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[22]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[23]  A. Neyman Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[24]  Nimrod Megiddo,et al.  How to Combine Expert (and Novice) Advice when Actions Impact the Environment? , 2003, NIPS.

[25]  W. Hoeffding On the Distribution of the Number of Successes in Independent Trials , 1956 .

[26]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[27]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[28]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[29]  Yoav Shoham,et al.  A portfolio approach to algorithm select , 2003, IJCAI 2003.

[30]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[31]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.