Advice taking in multiagent reinforcement learning

This paper proposes the β-WoLF algorithm for multiagent reinforcement learning (MARL) that uses an additional "advice" signal to inform agents about mutually beneficial forms of behaviour. β-WoLF is an extension of the WoLF-PHC algorithm that allows agents to assess whether the advice obtained through this additional reward signal is (i) useful for the learning agent itself and (ii) currently being followed by other agents in the system. We report on experimental results obtained with this novel algorithm which indicate that it enables cooperation in scenarios in which the need to defend oneself against exploitation results in poor coordination using existing MARL algorithms.

[1]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[2]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Martin L. Puterman,et al.  Discounted Markov Decision Problems , 2008 .

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Leslie Pack Kaelbling,et al.  Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[9]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[10]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[11]  Drew Fudenberg,et al.  Game theory (3. pr.) , 1991 .

[12]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[13]  Kagan Tumer,et al.  Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[14]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[15]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[16]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[17]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.