Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing

The ability to leverage shared behaviors between tasks is critical for sample-efficient multi-task reinforcement learning (MTRL). While prior methods have primarily explored parameter and data sharing, direct behavior-sharing has been limited to task families requiring similar behaviors. Our goal is to extend the efficacy of behavior-sharing to more general task families that could require a mix of shareable and conflicting behaviors. Our key insight is an agent's behavior across tasks can be used for mutually beneficial exploration. To this end, we propose a simple MTRL framework for identifying shareable behaviors over tasks and incorporating them to guide exploration. We empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation MTRL tasks and is even complementary to parameter sharing. Result videos are available at https://sites.google.com/view/qmp-mtrl.

[1]  Joseph J. Lim,et al.  Skill-based Meta-Reinforcement Learning , 2022, ICLR.

[2]  Stephen James,et al.  Auto-Lambda: Disentangling Dynamic Task Relationships , 2022, Trans. Mach. Learn. Res..

[3]  S. Levine,et al.  How to Leverage Unlabeled Data in Offline Reinforcement Learning , 2022, ICML.

[4]  M. P. Kumar,et al.  In Defense of the Unitary Scalarization for Deep Multi-Task Learning , 2022, NeurIPS.

[5]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[6]  Gerhard Neumann,et al.  Specializing Versatile Skill Libraries using Local Mixture of Experts , 2021, CoRL.

[7]  Sergey Levine,et al.  Conservative Data Sharing for Multi-Task Offline Reinforcement Learning , 2021, NeurIPS.

[8]  Christopher Fifty,et al.  Efficiently Identifying Task Groupings for Multi-Task Learning , 2021, Neural Information Processing Systems.

[9]  Chicheng Zhang,et al.  Provably Efficient Multi-Task Reinforcement Learning with Model Transfer , 2021, NeurIPS.

[10]  S. Levine,et al.  MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[11]  Joelle Pineau,et al.  Multi-Task Reinforcement Learning with Context-based Representations , 2021, ICML.

[12]  Samuel J. Gershman,et al.  Multi-task reinforcement learning in humans , 2019, Nature Human Behaviour.

[13]  Ryota Yamashina,et al.  Behavioral Cloning from Noisy Demonstrations , 2021, ICLR.

[14]  Zhiyuan Xu,et al.  Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control , 2020, NeurIPS.

[15]  Qusay H. Mahmoud,et al.  A Survey of Multi-Task Deep Reinforcement Learning , 2020, Electronics.

[16]  Andrea Bonarini,et al.  Sharing Knowledge in Multi-Task Deep Reinforcement Learning , 2020, ICLR.

[17]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[18]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[19]  Li Fei-Fei,et al.  Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations , 2020, Robotics: Science and Systems.

[20]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[21]  Sergey Levine,et al.  Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives , 2019, ICLR.

[22]  Jitendra Malik,et al.  Which Tasks Should Be Learned Together in Multi-task Learning? , 2019, ICML.

[23]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[24]  Joseph J. Lim,et al.  Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation , 2019, NeurIPS.

[25]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[26]  Ignacio Cases,et al.  Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.

[27]  Razvan Pascanu,et al.  Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.

[28]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[29]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[30]  Sergey Levine,et al.  Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[31]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[32]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[34]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[35]  Jan Peters,et al.  Layered direct policy search for learning hierarchical skills , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Balaraman Ravindran,et al.  Exploration for Multi-task Reinforcement Learning with Deep Generative Models , 2016, ArXiv.

[38]  John Shawe-Taylor,et al.  Learning Shared Representations in Multi-task Reinforcement Learning , 2016, ArXiv.

[39]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[40]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[41]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[42]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[44]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[45]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[46]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.