Effective learning in the presence of adaptive counterparts

Adaptive learning algorithms (ALAs) is an important class of agents that learn the utilities of their strategies jointly with the maintenance of the beliefs about their counterparts' future actions. In this paper, we propose an approach of learning in the presence of adaptive counterparts. Our Q-learning based algorithm, called Adaptive Dynamics Learner (ADL), assigns Q-values to the fixed-length interaction histories. This makes it capable of exploiting the strategy update dynamics of the adaptive learners. By so doing, ADL usually obtains higher utilities than those of equilibrium solutions. We tested our algorithm on a substantial representative set of the most known and demonstrative matrix games. We observed that ADL is highly effective in the presence of such ALAs as Adaptive Play Q-learning, Infinitesimal Gradient Ascent, Policy Hill-Climbing and Fictitious Play Q-learning. Further, in self-play ADL usually converges to a Pareto efficient average utility.

[1]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[4]  Brahim Chaib-draa,et al.  Apprentissage de la coordination multiagent. Une méthode basée sur le Q-learning par jeu adaptatif , 2006, Rev. d'Intelligence Artif..

[5]  Bikramjit Banerjee,et al.  Generalized multiagent learning with performance bound , 2007, Autonomous Agents and Multi-Agent Systems.

[6]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[7]  John E. Moody,et al.  Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[8]  Sandip Sen,et al.  MB-AIM-FSI: a model based framework for exploiting gradient ascent multiagent learners in strategic interactions , 2008, AAMAS.

[9]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[10]  Leslie Pack Kaelbling,et al.  Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[11]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[12]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[13]  H. Young,et al.  Individual Strategy and Social Structure: An Evolutionary Theory of Institutions , 1999 .

[14]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[17]  Yoav Shoham,et al.  Learning against opponents with bounded memory , 2005, IJCAI.

[18]  Yoav Shoham,et al.  Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[19]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  Manuela Veloso,et al.  Multiagent learning in the presence of agents with limitations , 2003 .

[23]  Claus Skaanning Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence : UAI'00 , 2000 .

[24]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[25]  Michael A. Goodrich,et al.  Learning to compete, compromise, and cooperate in repeated general-sum games , 2005, ICML.

[26]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..