Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning

One of the great promises of robot learning systems is that they will be able to learn from their mistakes and continuously adapt to ever-changing environments. Despite this potential, most of the robot learning systems today are deployed as a fixed policy and they are not being adapted after their deployment. Can we efficiently adapt previously learned behaviors to new environments, objects and percepts in the real world? In this paper, we present a method and empirical evidence towards a robot learning framework that facilitates continuous adaption. In particular, we demonstrate how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning, including changes in background, object shape and appearance, lighting conditions, and robot morphology. Further, this adaptation uses less than 0.2% of the data necessary to learn the task from scratch. We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning, and that pre-training via RL is essential: training from scratch or adapting from supervised ImageNet features are both unsuccessful with such small amounts of data. We also find that these positive results hold in a limited continual learning setting, in which we repeatedly fine-tune a single lineage of policies using data from a succession of new tasks. Our empirical conclusions are consistently supported by experiments on simulated manipulation tasks, and by 52 unique fine-tuning experiments on a real robotic grasping system pre-trained on 580,000 grasps.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[3]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[6]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[7]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[8]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[9]  Yuval Tassa,et al.  Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[10]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[11]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Jean-Baptiste Mouret,et al.  Reset-free Trial-and-Error Learning for Robot Damage Recovery , 2016, Robotics Auton. Syst..

[13]  Chao Yang,et al.  A Survey on Deep Transfer Learning , 2018, ICANN.

[14]  Jackie Kay,et al.  Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer , 2019, ArXiv.

[15]  Ville Kyrki,et al.  Transferring Generalizable Motor Primitives From Simulation to Real World , 2019, IEEE Robotics and Automation Letters.

[16]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[17]  Yen-Chen Lin,et al.  Experience-Embedded Visual Foresight , 2019, CoRL.

[18]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sergey Levine,et al.  RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[22]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[23]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[24]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[26]  Andrew J. Davison,et al.  Learning One-Shot Imitation From Humans Without Humans , 2019, IEEE Robotics and Automation Letters.

[27]  Abhinav Gupta,et al.  Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[28]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[29]  Sergey Levine,et al.  Unsupervised Perceptual Rewards for Imitation Learning , 2016, Robotics: Science and Systems.

[30]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[33]  Jean-Baptiste Mouret,et al.  Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[35]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Russell L. Tedrake,et al.  Applied optimal control for dynamically stable legged locomotion , 2004 .

[38]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[39]  Chelsea Finn,et al.  Unsupervised Visuomotor Control through Distributional Planning Networks , 2019, Robotics: Science and Systems.

[40]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[41]  Jean-Baptiste Mouret,et al.  Adaptive Prior Selection for Repertoire-Based Online Adaptation in Robotics , 2019, Frontiers in Robotics and AI.

[42]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[43]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[44]  Leslie Pack Kaelbling,et al.  Modular meta-learning , 2018, CoRL.

[45]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46]  Yuval Tassa,et al.  Reusable neural skill embeddings for vision-guided whole body movement and object manipulation , 2019, ArXiv.

[47]  Sergey Levine,et al.  Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Reward Signals , 2019 .

[48]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[49]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[51]  Sergey Levine,et al.  Off-Policy Evaluation via Off-Policy Classification , 2019, NeurIPS.

[52]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[53]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[54]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[55]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[56]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[57]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[58]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Andrew J. Davison,et al.  Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[60]  Ville Kyrki,et al.  Affordance Learning for End-to-End Visuomotor Robot Control , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[61]  Gregory Dudek,et al.  Adapting learned robotics behaviours through policy adjustment , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[62]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[63]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[64]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).