论文信息 - Multi-task Learning with Gradient Guided Policy Specialization - 字舞流文

Multi-task Learning with Gradient Guided Policy Specialization

We present a method for efficient learning of control policies for multiple related robotic motor skills. Our approach consists of two stages, joint training and specialization training. During the joint training stage, a neural network policy is trained with minimal information to disambiguate the motor skills. This forces the policy to learn a common representation of the different tasks. Then, during the specialization training stage we selectively split the weights of the policy based on a per-weight metric that measures the disagreement among the multiple tasks. By splitting part of the control policy, it can be further trained to specialize to each task. To update the control policy during learning, we use Trust Region Policy Optimization with Generalized Advantage Function (TRPOGAE). We propose a modification to the gradient update stage of TRPO to better accommodate multi-task learning scenarios. We evaluate our approach on three continuous motor skill learning problems in simulation: 1) a locomotion task where three single legged robots with considerable difference in shape and size are trained to hop forward, 2) a manipulation task where three robot manipulators with different sizes and joint types are trained to reach different locations in 3D space, and 3) locomotion of a two-legged robot, whose range of motion of one leg is constrained in different ways. We compare our training method to three baselines. The first baseline uses only joint training for the policy, the second trains independent policies for each task, and the last randomly selects weights to split. We show that our approach learns more efficiently than each of the baseline methods.

C. Karen Liu | Greg Turk | Wenhao Yu | C. K. Liu | Wenhao Yu | Greg Turk

[1] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[2] Siddhartha S. Srinivasa,et al. DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[3] Glen Berseth,et al. DeepLoco , 2017, ACM Trans. Graph..

[4] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[5] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[6] Hussein A. Abbass,et al. Multi-Task Deep Reinforcement Learning for Continuous Action Control , 2017, IJCAI.

[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[8] John Shawe-Taylor,et al. Learning Shared Representations in Multi-task Reinforcement Learning , 2016, ArXiv.

[9] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[10] Abhinav Gupta,et al. Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.

[12] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[13] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[14] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[15] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[16] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[17] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[18] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[19] Jan Peters,et al. Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[20] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[21] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[22] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[23] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.