(CAD)$^2$RL: Real Single-Image Flight without a Single Real Image

Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: this https URL

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[3]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[4]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[5]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[6]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[7]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[8]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[9]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[10]  Arun K. Somani,et al.  Monocular vision SLAM for indoor aerial vehicles , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Abraham Bachrach,et al.  Autonomous flight in unstructured and unknown indoor environments , 2009 .

[12]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[13]  Ashutosh Saxena,et al.  Autonomous MAV flight in indoor environments using single image perspective cues , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[16]  Avideh Zakhor,et al.  Automatic loop closure detection using multiple cameras for 3D indoor localization , 2012, Electronic Imaging.

[17]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[18]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[19]  Vijay Kumar,et al.  Vision-based state estimation for autonomous rotorcraft MAVs in complex environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[20]  Michael Suppa,et al.  Stereo vision based indoor/outdoor navigation for flying robots , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[22]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[24]  Russ Tedrake,et al.  Pushbroom stereo for high-speed navigation in cluttered environments , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[26]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Pieter Abbeel,et al.  Deep learning helicopter dynamics models , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Tsuhan Chen,et al.  Deep Neural Network for Real-Time Autonomous Indoor Navigation , 2015, ArXiv.

[30]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[31]  Jonathan P. How,et al.  Efficient reinforcement learning for robots using informative simulated priors , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[35]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[36]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[37]  Sergey Levine,et al.  Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[38]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[39]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Wolfram Burgard,et al.  Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[42]  Hamid Izadinia,et al.  IM2CAD , 2016, 1608.05137.

[43]  Jonathan P. How,et al.  Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.