Using Multi-Agent Options to Reduce Learning Time in Reinforcement Learning

Distributed multi-agent learning has recently received significant interest but also proven to be very complex as the decisions made by any individual agent are not the only factors in the outcomes of those decisions. Uncertainty associated in the decisions and exploration choices of other agents add complexity and delay to individual learning processes. To address this complexity and provide for better scaling of distributed multi-agent learning this paper extends the options framework and the Nash-Q learning technique to apply to multi-agent distributed learning in non-cooperative game theoretic settings. We illustrate the effectiveness of this approach in a grid world and demonstrate improved learning for a set of tasks in a semi-cooperative environment.

[1]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[2]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[3]  Helder Coelho,et al.  A Hybrid Approach to Teamwork , 2007 .

[4]  G. Konidaris,et al.  Planning with Macro-Actions in Decentralized POMDPs Citation , 2014 .

[5]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Martin A. Riedmiller,et al.  Using Machine Learning Techniques in Complex Multi-Agent Domains , 2003 .

[8]  Andrew McLennan,et al.  Gambit: Software Tools for Game Theory , 2006 .

[9]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[10]  Jing Shen,et al.  Multi-robot Cooperation Based on Hierarchical Reinforcement Learning , 2007, International Conference on Computational Science.

[11]  Sridhar Mahadevan,et al.  Decision-Theoretic Planning with Concurrent Temporally Extended Actions , 2001, UAI.

[12]  Risto Miikkulainen,et al.  Self-Organizing Perceptual and Temporal Abstraction for Robot Reinforcement Learning , 2004, AAAI 2004.

[13]  Sridhar Mahadevan,et al.  Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..