论文信息 - How to train your robot with deep reinforcement learning: lessons we have learned

How to train your robot with deep reinforcement learning: lessons we have learned

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.

[1] Ashish Agarwal,et al. Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[2] Aleksandra Faust,et al. Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[3] Murilo F. Martins,et al. Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup , 2019, Robotics: Science and Systems.

[4] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[5] Sergey Levine,et al. Reset-free guided policy search: Efficient deep reinforcement learning with stochastic initial states , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[7] Brijen Thananjeyan,et al. Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones , 2020, IEEE Robotics and Automation Letters.

[8] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[9] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10] Andrew J. Davison,et al. Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[11] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.

[12] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[13] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[14] Avik De,et al. Modular Hopping and Running via Parallel Composition , 2017 .

[15] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[16] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[17] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[18] Russ Tedrake,et al. Self-Supervised Correspondence in Visuomotor Policy Learning , 2019, IEEE Robotics and Automation Letters.

[19] Masashi Sugiyama,et al. Imitation Learning from Imperfect Demonstration , 2019, ICML.

[20] Marc H. Raibert,et al. Legged Robots That Balance , 1986, IEEE Expert.

[21] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[22] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.

[23] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[24] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] C. Karen Liu,et al. Learning bicycle stunts , 2014, ACM Trans. Graph..

[26] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[28] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[29] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[30] Raia Hadsell,et al. Value constrained model-free continuous control , 2019, ArXiv.

[31] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[32] Peter Corke,et al. Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[33] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[34] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[35] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[36] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[37] Sergey Levine,et al. Learning Image-Conditioned Dynamics Models for Control of Underactuated Legged Millirobots , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[39] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[40] Sehoon Ha,et al. Learning to be Safe: Deep RL with a Safety Critic , 2020, ArXiv.

[41] Sergey Levine,et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[42] Scott Kuindersma,et al. Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[43] Aleksandra Faust,et al. Long-Range Indoor Navigation With PRM-RL , 2020, IEEE Transactions on Robotics.

[44] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[45] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[47] Sergey Levine,et al. Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[48] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[49] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[50] Sergey Levine,et al. RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[51] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[52] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[53] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[54] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[55] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[56] Oleg O. Sushkov,et al. A Framework for Data-Driven Robotics , 2019, ArXiv.

[57] Kagan Tumer,et al. Collaborative Evolutionary Reinforcement Learning , 2019, ICML.

[58] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[59] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[60] Sergey Levine,et al. Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[61] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[62] Sergey Levine,et al. Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Allan Jabri,et al. Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[64] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[65] Vladlen Koltun,et al. Learning by Cheating , 2019, CoRL.

[66] Kate Saenko,et al. Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[67] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[68] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[69] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[70] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.

[71] Anca D. Dragan,et al. Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[72] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[73] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[74] Dumitru Erhan,et al. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Zhao Chen,et al. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[76] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[78] Sergey Levine,et al. Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[79] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[80] Petra Kaufmann,et al. Learning to walk , 2007, Neurology.

[81] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[82] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[83] Sebastian Engell,et al. Model Predictive Control Using Neural Networks [25 Years Ago] , 1995, IEEE Control Systems.

[84] Sergey Levine,et al. Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[85] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[86] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[87] Dorsa Sadigh,et al. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences , 2020, Int. J. Robotics Res..

[88] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[89] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[90] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[91] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[92] Leslie Pack Kaelbling,et al. Residual Policy Learning , 2018, ArXiv.

[93] Vladlen Koltun,et al. Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[94] Sergey Levine,et al. Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[95] Peter I. Corke,et al. Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[96] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[97] Andrew J. Davison,et al. Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[98] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[99] Wolfram Burgard,et al. The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..

[100] Oliver Kroemer,et al. Learning sequential motor tasks , 2013, 2013 IEEE International Conference on Robotics and Automation.

[101] Sergey Levine,et al. Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight , 2019, Robotics: Science and Systems.

[102] S. Schaal. Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[103] Mohi Khansari,et al. RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[104] Danica Kragic,et al. Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[105] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[106] Atil Iscen,et al. NoRML: No-Reward Meta Learning , 2019, AAMAS.

[107] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[108] Jan Peters,et al. Learning to sequence movement primitives from demonstrations , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[109] E. Altman. Constrained Markov Decision Processes , 1999 .

[110] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..

[111] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[112] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[113] Russ Tedrake,et al. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[114] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[115] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.

[116] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[117] Kate Saenko,et al. Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[118] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[119] Atil Iscen,et al. Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.

[120] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .

[121] Sergey Levine,et al. Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[122] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[123] Franziska Meier,et al. SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Control , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[124] Gaurav S. Sukhatme,et al. Regrasping Using Tactile Perception and Supervised Policy Learning , 2017, AAAI Spring Symposia.

[125] Atil Iscen,et al. Policies Modulating Trajectory Generators , 2018, CoRL.

[126] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.

[127] Karol Hausman,et al. Thinking While Moving: Deep Reinforcement Learning with Concurrent Control , 2020, ICLR.

[128] Kuan-Ting Yu,et al. Realtime State Estimation with Tactile and Visual Sensing. Application to Planar Manipulation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[129] Lorenz Wellhausen,et al. Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[130] C. Karen Liu,et al. Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[131] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[132] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[133] Alberto Rodriguez,et al. Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[134] Abhinav Gupta,et al. Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[135] Sehoon Ha,et al. Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[136] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[137] Gaurav S. Sukhatme,et al. Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[138] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[139] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[140] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[141] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[142] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.

[143] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[144] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[145] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[146] Sehoon Ha,et al. Automated Deep Reinforcement Learning Environment for Hardware of a Modular Legged Robot , 2018, 2018 15th International Conference on Ubiquitous Robots (UR).

[147] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[148] Aravind Rajeswaran,et al. Learning Deep Visuomotor Policies for Dexterous Hand Manipulation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[149] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.

[150] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[151] C. Karen Liu,et al. Online control of simulated humanoids using particle belief propagation , 2015, ACM Trans. Graph..

[152] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[153] Joonho Lee,et al. Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[154] Sergey Levine,et al. Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[155] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[156] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[157] Oliver Kroemer,et al. A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[158] Xinyu Liu,et al. Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.

[159] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[160] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[161] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[162] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[163] Yen-Chen Lin,et al. Experience-Embedded Visual Foresight , 2019, CoRL.