Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey

Deep} reinforcement learning has recently seen huge success across multiple areas in the robotics domain. Owing to the limitations of gathering real-world data, i.e., sample inefficiency and the cost of collecting it, simulation environments are utilized for training the different agents. This not only aids in providing a potentially infinite data source, but also alleviates safety concerns with real robots. Nonetheless, the gap between the simulated and real worlds degrades the performance of the policies once the models are transferred into real robots. Multiple research efforts are therefore now being directed towards closing this sim-toreal gap and accomplish more efficient policy transfer. Recent years have seen the emergence of multiple methods applicable to different domains, but there is a lack, to the best of our knowledge, of a comprehensive review summarizing and putting into context the different methods. In this survey paper, we cover the fundamental background behind sim-to-real transfer in deep reinforcement learning and overview the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation. We categorize some of the most relevant recent works, and outline the main application scenarios. Finally, we discuss the main opportunities and challenges of the different approaches and point to the most promising directions.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Guy Albert Dumont,et al.  System identification and control using genetic algorithms , 1992, IEEE Trans. Syst. Man Cybern..

[3]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Jorge Pena Queralta,et al.  Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning , 2020, 2020 5th International Conference on Robotics and Automation Engineering (ICRAE).

[5]  Thomas Chaffre,et al.  Sim-to-Real Transfer with Incremental Environment Complexity for Reinforcement Learning of Depth-Based Robot Navigation , 2020, ICINCO.

[6]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[7]  David Filliat,et al.  Continual Reinforcement Learning deployed in Real-life using Policy Distillation and Sim2Real Transfer , 2019, ArXiv.

[8]  Manuel Kaspar,et al.  Reinforcement Learning with Cartesian Commands and Sim to Real Transfer for Peg in Hole Tasks , 2019 .

[9]  Chen Wang,et al.  A Survey on Visual Navigation for Artificial Agents With Deep Reinforcement Learning , 2020, IEEE Access.

[10]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[11]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[12]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Roland Siegwart,et al.  RotorS—A Modular Gazebo MAV Simulator Framework , 2016 .

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Varun Jampani,et al.  Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Ville Kyrki,et al.  Meta Reinforcement Learning for Sim-to-real Domain Adaptation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Bo Li,et al.  Reinforcement Learning with Perturbed Rewards , 2018, AAAI.

[19]  Yue Gao,et al.  Sim-to-real: Six-legged Robot Control with Deep Reinforcement Learning and Curriculum Learning , 2019, 2019 4th International Conference on Robotics and Automation Engineering (ICRAE).

[20]  Vikash Kumar,et al.  Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.

[21]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[22]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[23]  Gregory D. Hager,et al.  “Good Robot!”: Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer , 2020, IEEE Robotics and Automation Letters.

[24]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Roland Siegwart,et al.  Flexible Robotic Grasping with Sim-to-Real Transfer based Reinforcement Learning , 2018, ArXiv.

[26]  Eric Horvitz,et al.  Blind Spot Detection for Safe Sim-to-Real Transfer , 2020, J. Artif. Intell. Res..

[27]  Saurabh Gupta,et al.  DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning , 2019, ArXiv.

[28]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[29]  David Murphy,et al.  Sim-to-Real in Reinforcement Learning for Everyone , 2019, 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE).

[30]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2021, Proceedings of the IEEE.

[31]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[32]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[33]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Hannu Tenhunen,et al.  Collaborative Multi-Robot Systems for Search and Rescue: Coordination and Perception , 2020, ArXiv.

[35]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[36]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[37]  Ole-Magnus Pedersen Sim-to-Real Transfer of Robotic Gripper Pose Estimation - Using Deep Reinforcement Learning, Generative Adversarial Networks, and Visual Servoing , 2019 .

[38]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[39]  Tuan Nguyen Gia,et al.  Distributed Progressive Formation Control for Multi-Agent Systems: 2D and 3D deployment of UAVs in ROS/Gazebo with RotorS , 2019, 2019 European Conference on Mobile Robots (ECMR).

[40]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[41]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[42]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[43]  Stephen Tyree,et al.  Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand Observations and Continuous Control , 2017, ArXiv.

[44]  Gregory D. Hager,et al.  "Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks via Reward Shaping , 2019, ArXiv.

[45]  Simulation to Real Transfer Learning with Robustified Policies for Robot Tasks , .

[46]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[47]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[49]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[50]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[51]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[52]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[53]  Jackie Kay,et al.  Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer , 2019, ArXiv.

[54]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Martin A. Riedmiller,et al.  Robust Reinforcement Learning for Continuous Control with Model Misspecification , 2019, ICLR.

[56]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[57]  Nathan F. Lepora,et al.  Sim-to-Real Transfer for Optical Tactile Sensing , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[58]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[59]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[60]  Jorge Pena Queralta,et al.  Ubiquitous Distributed Deep Reinforcement Learning at the Edge: Analyzing Byzantine Agents in Discrete Action Spaces , 2020, EUSPN/ICTH.

[61]  Gabriela Csurka,et al.  Deep Visual Domain Adaptation , 2020, 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC).

[62]  Joshua Tobin,et al.  Real-World Robotic Perception and Control Using Synthetic Data , 2019 .

[63]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[64]  Manuel Kaspar,et al.  Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[65]  Pedro H. M. Braga,et al.  Learning to Play Soccer by Reinforcement and Applying Sim-to-Real to Compete in the Real World , 2019, LatinX in AI at Neural Information Processing Systems Conference 2019.

[66]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[67]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[68]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[69]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[70]  Alberto L. Sangiovanni-Vincentelli,et al.  Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[72]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[73]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[74]  Shie Mannor,et al.  Action Robust Reinforcement Learning and Applications in Continuous Control , 2019, ICML.

[75]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[76]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[77]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[78]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[79]  David B. Graves,et al.  Sim-to-real transfer reinforcement learning for control of thermal effects of an atmospheric pressure plasma jet , 2019, Plasma Sources Science and Technology.

[80]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[81]  Jan Peters,et al.  Bayesian Domain Randomization for Sim-to-Real Transfer , 2020, ArXiv.