Extending Q-Learning to General Adaptive Multi-Agent Systems

Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed "Hyper-Q" Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.

[1]  C. Watkins Learning from delayed rewards , 1989 .

[2]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[4]  Rémi Munos,et al.  A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method , 1997, IJCAI.

[5]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[6]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[7]  M. Nowak,et al.  Evolutionary game theory , 1995, Current Biology.

[8]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[9]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[10]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[11]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[12]  Leslie Pack Kaelbling,et al.  Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[13]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[14]  Yishay Mansour,et al.  Efficient Nash Computation in Large Population Games with Bounded Influence , 2002, UAI.

[15]  J. Hosking,et al.  Multiplicative Adjustment of Class Probability : Educating Naı̈ve Bayes , 2002 .