Context-Based Concurrent Experience Sharing in Multiagent Systems

One of the key challenges for multi-agent learning is scalability. We introduce a technique for speeding up multi-agent learning by exploiting concurrent and incremental experience sharing. This solution adaptively identifies opportunities to transfer experiences between agents and allows for the rapid acquisition of appropriate policies in large-scale, stochastic, multi-agent systems. We introduce an online, supervisor-directed transfer technique for constructing high-level characterizations of an agent's dynamic learning environment---called contexts---which are used to identify groups of agents operating under approximately similar dynamics within a short temporal window. Supervisory agents compute contextual information for groups of subordinate agents, thereby identifying candidates for experience sharing. We show that our approach results in significant performance gains, that it is robust to noise-corrupted or suboptimal context features, and that communication costs scale linearly with the supervisor-to-subordinate ratio.

[1]  Robert,et al.  THE FUNCTIONAL CENTRAL LIMIT THEOREM AND WEAK CONVERGENCE TO STOCHASTIC INTEGRALS I Weakly Dependent Processes , 2000 .

[2]  Andrew G. Barto,et al.  Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[3]  J.L. Carroll,et al.  Task similarity measures for transfer in reinforcement learning task libraries , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[4]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[5]  Victor R. Lesser,et al.  Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[6]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[7]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[8]  Yujing Hu,et al.  Learning in Multi-agent Systems with Sparse Interactions by Knowledge Transfer and Game Abstraction , 2015, AAMAS.

[9]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[10]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Victor R. Lesser,et al.  Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[13]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[14]  James Davidson,et al.  THE FUNCTIONAL CENTRAL LIMIT THEOREM AND WEAK CONVERGENCE TO STOCHASTIC INTEGRALS I , 2000, Econometric Theory.

[15]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[16]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[17]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[18]  Victor R. Lesser,et al.  Self-organization for coordinating decentralized reinforcement learning , 2010, AAMAS.

[19]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[20]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[21]  Ioannis P. Vlahavas,et al.  Transfer Learning in Multi-Agent Reinforcement Learning Domains , 2011, EWRL.

[22]  Edmund H. Durfee,et al.  Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[23]  Hiroaki Kitano,et al.  RoboCup Rescue: search and rescue in large-scale disasters as a domain for autonomous agents research , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[24]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[25]  V. Lesser,et al.  Accelerating Multi-agent Reinforcement Learning with Dynamic Co-learning , 2015 .

[26]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[27]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[28]  Albert Boonstra,et al.  6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VIII, PROCEEDINGS , 2002 .

[29]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[30]  R. M. Kretchmar Parallel Reinforcement Learning , 2002 .

[31]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[32]  Siobhán Clarke,et al.  Transfer learning in multi-agent systems through parallel transfer , 2013 .

[33]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.