Residual Reinforcement Learning from Demonstrations

Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.

[1]  Sergey Levine,et al.  Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[3]  Gabriel Dulac-Arnold,et al.  Model-Based Offline Planning , 2020, ArXiv.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[6]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[7]  Stefan Schaal,et al.  Residual Learning from Demonstration , 2020, ArXiv.

[8]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[9]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[10]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Martin A. Riedmiller,et al.  Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics , 2020, CoRL.

[12]  Andrea L. Thomaz,et al.  TuneNet: One-Shot Residual Tuning for System Identification and Sim-to-Real Robot Task Transfer , 2019, CoRL.

[13]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[14]  Sergio Gomez Colmenarejo,et al.  Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[15]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Leonidas Guibas,et al.  Side-Tuning: Network Adaptation via Additive Side Networks , 2019, ArXiv.

[17]  Leslie Pack Kaelbling,et al.  Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Niko Sünderhauf,et al.  Residual Reactive Navigation: Combining Classical and Learned Navigation Strategies For Deployment in Unknown Environments , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[20]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[21]  Cordelia Schmid,et al.  Learning to combine primitive skills: A step towards versatile robotic manipulation § , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Aravind Rajeswaran,et al.  Learning Deep Visuomotor Policies for Dexterous Hand Manipulation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[23]  Pietro Falco,et al.  Data-efficient control policy search using residual dynamics learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Ryo Yonetani,et al.  MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics , 2019, IJCAI.

[25]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[26]  Sergio Gomez Colmenarejo,et al.  RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .

[27]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[28]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[29]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[30]  Leslie Pack Kaelbling,et al.  Residual Policy Learning , 2018, ArXiv.

[31]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[32]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[33]  Nando de Freitas,et al.  Critic Regularized Regression , 2020, NeurIPS.

[34]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[35]  Sergey Levine,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[36]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.