A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems

Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with other agents through autonomous exploration of the environment. However, learning a complex task from scratch is impractical due to the huge sample complexity of RL algorithms. For this reason, reusing knowledge that can come from previous experience or other agents is indispensable to scale up multiagent RL algorithms. This survey provides a unifying view of the literature on knowledge reuse in multiagent RL. We define a taxonomy of solutions for the general knowledge reuse problem, providing a comprehensive discussion of recent progress on knowledge reuse in Multiagent Systems (MAS) and of techniques for knowledge reuse across agents (that may be actuating in a shared environment or not). We aim at encouraging the community to work towards reusing all the knowledge sources available in a MAS. For that, we provide an in-depth discussion of current lines of research and open questions.

[1]  Risto Miikkulainen,et al.  Object-Model Transfer in the General Video Game Domain , 2016, AIIDE.

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[4]  Clayton T. Morrison,et al.  Blending Autonomous Exploration and Apprenticeship Learning , 2011, NIPS.

[5]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[6]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[7]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[8]  Zoran Popovic,et al.  Where to Add Actions in Human-in-the-Loop Reinforcement Learning , 2017, AAAI.

[9]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[10]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[11]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[12]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[13]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[14]  Bo An,et al.  HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge , 2018, AAAI.

[15]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[16]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Matthieu Zimmer,et al.  Teacher-Student Framework: a Reinforcement Learning Approach , 2014 .

[18]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[19]  Felipe Leno da Silva,et al.  Towards Knowledge Transfer in Deep Reinforcement Learning , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[20]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[21]  Thomas G. Dietterich,et al.  Active lmitation learning: formal and practical reductions to I.I.D. learning , 2014, J. Mach. Learn. Res..

[22]  Matthew E. Taylor,et al.  Autonomously Reusing Knowledge in Multiagent Reinforcement Learning , 2018, IJCAI.

[23]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[24]  John Salvatier,et al.  Agent-Agnostic Human-in-the-Loop Reinforcement Learning , 2017, ArXiv.

[25]  Akiya Kamimura,et al.  Transfer Learning Method Using Ontology for Heterogeneous Multi-agent Reinforcement Learning , 2014 .

[26]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[27]  Fabio Gagliardi Cozman,et al.  Speeding-up reinforcement learning through abstraction and transfer learning , 2013, AAMAS.

[28]  Michael G. Madden,et al.  Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.

[29]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[30]  Babak Esfandiari,et al.  A Case-Based Reasoning Approach to Imitating RoboCup Players , 2008, FLAIRS.

[31]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[32]  Sonia Chernova,et al.  Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.

[33]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[34]  Yujing Hu,et al.  Learning in Multi-agent Systems with Sparse Interactions by Knowledge Transfer and Game Abstraction , 2015, AAMAS.

[35]  Xiaodong Li,et al.  Learning Options From Demonstrations: A Pac-Man Case Study , 2017, IEEE Transactions on Games.

[36]  Pablo Hernandez-Leal,et al.  A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[37]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[38]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[39]  Peter Stone,et al.  Automatic Curriculum Graph Generation for Reinforcement Learning Agents , 2017, AAAI.

[40]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[41]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[42]  Felipe Leno da Silva,et al.  Accelerating Multiagent Reinforcement Learning through Transfer Learning , 2017, AAAI.

[43]  Ruben Glatt,et al.  MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning , 2019, IEEE Transactions on Cybernetics.

[44]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[45]  Motoyuki Ozeki,et al.  Learning through Imitation and Reinforcement Learning: Toward the Acquisition of Painting Motions , 2014, 2014 IIAI 3rd International Conference on Advanced Applied Informatics.

[46]  Peter Stone,et al.  Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.

[47]  Anna Helena Reali Costa,et al.  Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning , 2015, IEEE Transactions on Cybernetics.

[48]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[49]  Karen M. Feigh,et al.  Learning From Explanations Using Sentiment and Advice in RL , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[50]  Felipe Leno da Silva,et al.  Towards Zero-Shot Autonomous Inter-Task Mapping through Object-Oriented Task Description , 2017 .

[51]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[52]  Yang Gao,et al.  Multiagent Reinforcement Learning With Unshared Value Functions , 2015, IEEE Transactions on Cybernetics.

[53]  Kristian Kersting,et al.  Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[54]  Roberto Capobianco Robust and Incremental Robot Learning by Imitation , 2014, DWAI@AI*IA.

[55]  Vinny Cahill,et al.  Distributed W-Learning: Multi-Policy Optimization in Self-Organizing Systems , 2009, 2009 Third IEEE International Conference on Self-Adaptive and Self-Organizing Systems.

[56]  Sonia Chernova,et al.  Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.

[57]  Peter Vrancx,et al.  Learning multi-agent state space representations , 2010, AAMAS.

[58]  Reinaldo A. C. Bianchi,et al.  Transferring knowledge as heuristics in reinforcement learning: A case-based approach , 2015, Artif. Intell..

[59]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[60]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[61]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[62]  Felipe Leno da Silva,et al.  Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[63]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[64]  Yusen Zhan,et al.  Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer , 2016, IJCAI.

[65]  Ioannis P. Vlahavas,et al.  Reinforcement learning agents providing advice in complex video games , 2014, Connect. Sci..

[66]  Peter Stone,et al.  Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[67]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[68]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[69]  Peter Stone,et al.  Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..

[70]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[71]  Maurice Bruynooghe,et al.  Multi-agent Relational Reinforcement Learning , 2005, LAMAS.

[72]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[73]  Hiroaki Kitano,et al.  RoboCup: A Challenge Problem for AI , 1997, AI Mag..

[74]  Eric Chalmers,et al.  Learning to Predict Consequences as a Method of Knowledge Transfer in Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[75]  Jude W. Shavlik,et al.  Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[76]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[77]  Matthew Hausknecht and Peter Stone,et al.  Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork , 2016 .

[78]  Peter Stone,et al.  Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[79]  Alvaro Ovalle Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms , 2016 .

[80]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[81]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[82]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[83]  Geoff S. Nitschke,et al.  Multi-agent Behavior-Based Policy Transfer , 2016, EvoApplications.

[84]  Peter A. Beling,et al.  Multi-agent Inverse Reinforcement Learning for Zero-sum Games , 2014, ArXiv.

[85]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[86]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[87]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[88]  Thomas G. Dietterich,et al.  Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.

[89]  Matthew E. Taylor,et al.  Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning , 2017, ArXiv.

[90]  Felipe Leno da Silva,et al.  Object-Oriented Curriculum Generation for Reinforcement Learning , 2018, AAMAS.

[91]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[92]  Andrea Lockerd Thomaz,et al.  Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[93]  Gergely V. Záruba,et al.  Inverse reinforcement learning for decentralized non-cooperative multiagent systems , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[94]  Reinaldo A. C. Bianchi,et al.  Improving Reinforcement Learning by Using Case Based Heuristics , 2009, ICCBR.

[95]  Ana L. C. Bazzan Beyond Reinforcement Learning and Local View in Multiagent Systems , 2014, KI - Künstliche Intelligenz.

[96]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[97]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[98]  Manuela M. Veloso,et al.  Multi-thresholded approach to demonstration selection for interactive robot learning , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[99]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[100]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[101]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[102]  Anna Helena Reali Costa,et al.  Comparative Analysis of Abstract Policies to Transfer Learning in Robotics Navigation , 2015, AAAI Workshop: Knowledge, Skill, and Behavior Transfer in Autonomous Robots.

[103]  Siobhán Clarke,et al.  Accelerating Learning in multi-objective systems through Transfer Learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[104]  Manuela M. Veloso,et al.  Decentralized MDPs with sparse interactions , 2011, Artif. Intell..

[105]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[106]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[107]  Matthew E. Taylor,et al.  Policy Transfer using Reward Shaping , 2015, AAMAS.

[108]  Matthew E. Taylor,et al.  Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[109]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[110]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML.

[111]  Yang Gao,et al.  Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer , 2015, IEEE Transactions on Cybernetics.

[112]  Matthew E. Taylor,et al.  Initial Progress in Transfer for Deep Reinforcement Learning Algorithms , 2016 .

[113]  Siobhán Clarke,et al.  Transfer learning in multi-agent systems through parallel transfer , 2013 .

[114]  Ioannis P. Vlahavas,et al.  Transfer Learning in Multi-Agent Reinforcement Learning Domains , 2011, EWRL.

[115]  Peter Stone,et al.  Source Task Creation for Curriculum Learning , 2016, AAMAS.

[116]  M. Littman,et al.  An Empirical Study of Non-Expert Curriculum Design for Machine Learners , 2016 .

[117]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[118]  Reinaldo A. C. Bianchi,et al.  Heuristically-Accelerated Multiagent Reinforcement Learning , 2014, IEEE Transactions on Cybernetics.

[119]  Bikramjit Banerjee Coordination Confidence based Human-Multi-Agent Transfer Learning for Collaborative Teams , 2018 .

[120]  David L. Roberts,et al.  A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.

[121]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[122]  Peter Vrancx,et al.  Transfer Learning for Multi-agent Coordination , 2011, ICAART.

[123]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[124]  Yuchen Cui,et al.  Active Reward Learning from Critiques , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[125]  Bo Li,et al.  Transferring knowledge from human-demonstration trajectories to reinforcement learning , 2018 .

[126]  Ioannis P. Vlahavas,et al.  Learning to Teach Reinforcement Learning Agents , 2017, Mach. Learn. Knowl. Extr..

[127]  Thomas G. Dietterich,et al.  Active Imitation Learning via Reduction to I.I.D. Active Learning , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[128]  Andrea Lockerd Thomaz,et al.  Object focused q-learning for autonomous agents , 2013, AAMAS.

[129]  Eduardo F. Morales,et al.  Transfer learning by prototype generation in continuous spaces , 2016, Adapt. Behav..

[130]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[131]  Brian Tanner,et al.  RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[132]  Er Meng Joo,et al.  A survey of inverse reinforcement learning techniques , 2012 .

[133]  Yujing Hu,et al.  Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer , 2015, IEEE Transactions on Cybernetics.

[134]  Rajesh P. N. Rao,et al.  Active Imitation Learning , 2007, AAAI.

[135]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[136]  Sarit Kraus,et al.  Leveraging human knowledge in tabular reinforcement learning: a study of human subjects , 2018, Knowl. Eng. Rev..

[137]  Yisong Yue,et al.  Coordinated Multi-Agent Imitation Learning , 2017, ICML.

[138]  Yusen Zhan,et al.  Efficiently detecting switches against non-stationary opponents , 2017, Autonomous Agents and Multi-Agent Systems.

[139]  Prasad Tadepalli,et al.  Multiagent Transfer Learning via Assignment-Based Decomposition , 2009, 2009 International Conference on Machine Learning and Applications.

[140]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[141]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[142]  Matthew E. Taylor,et al.  Improving Reinforcement Learning with Confidence-Based Demonstrations , 2017, IJCAI.

[143]  Estefania Argente,et al.  Multi-Agent System Development Based on Organizations , 2006, Electron. Notes Theor. Comput. Sci..

[144]  Michael L. Littman,et al.  Coco-Q: Learning in Stochastic Games with Side Payments , 2013, ICML.

[145]  Kalesha Bullard,et al.  Situated Mapping for Transfer Learning , 2016 .

[146]  Pablo Hernandez-Leal,et al.  Towards a Fast Detection of Opponents in Repeated Stochastic Games , 2017, AAMAS Workshops.

[147]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[148]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[149]  Malcolm I. Heywood,et al.  Knowledge Transfer from Keepaway Soccer to Half-field Offense through Program Symbiosis: Building Simple Programs for a Complex Task , 2015, GECCO.

[150]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[151]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[152]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[153]  R. S. Sutton,et al.  Some Recent Applications of Reinforcement Learning , 2017 .

[154]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..