Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight

Deep reinforcement learning provides a promising approach for vision-based control of real-world robots. However, the generalization of such models depends critically on the quantity and variety of data available for training. This data can be difficult to obtain for some types of robotic systems, such as fragile, small-scale quadrotors. Simulated rendering and physics can provide for much larger datasets, but such data is inherently of lower quality: many of the phenomena that make the real-world autonomous flight problem challenging, such as complex physics and air currents, are modeled poorly or not at all, and the systematic differences between simulation and the real world are typically impossible to eliminate. In this work, we investigate how data from both simulation and the real world can be combined in a hybrid deep reinforcement learning algorithm. Our method uses real-world data to learn about the dynamics of the system, and simulated data to learn a generalizable perception system that can enable the robot to avoid collisions using only a monocular camera. We demonstrate our approach on a real-world nano aerial vehicle collision avoidance task, showing that with only an hour of real-world data, the quadrotor can avoid collisions in new environments with various lighting conditions and geometry. Code, instructions for building the aerial vehicles, and videos of the experiments can be found at github.com/gkahn13/GtS

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[3]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[4]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[5]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[7]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[8]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[12]  Emanuel Todorov,et al.  Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[18]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[19]  Sergey Levine,et al.  One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Michael Milford,et al.  Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies , 2016, ICRA 2017.

[21]  Andrew J. Barry High‐speed autonomous obstacle avoidance with pushbroom stereo , 2018, J. Field Robotics.

[22]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[23]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[24]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[25]  Abhinav Gupta,et al.  Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[27]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[28]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[29]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[31]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Vijay Kumar,et al.  Experiments in Fast, Autonomous, GPS-Denied Quadrotor Flight , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Sergey Levine,et al.  Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Vladlen Koltun,et al.  Deep Drone Racing: Learning Agile Flight in Dynamic Environments , 2018, CoRL.

[35]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Vincent Lepetit,et al.  On Pre-Trained Image Features and Synthetic Images for Deep Learning , 2017, ECCV Workshops.

[37]  Divyam Rastogi,et al.  Sample-efficient Reinforcement Learning via Difference Models , 2018 .

[38]  Luca Benini,et al.  Ultra Low Power Deep-Learning-powered Autonomous Nano Drones , 2018, ArXiv.

[39]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[40]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[42]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Carlos R. del-Blanco,et al.  DroNet: Learning to Fly by Driving , 2018, IEEE Robotics and Automation Letters.

[44]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[45]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).