Reactive obstacle avoidance of monocular quadrotors with online adapted depth prediction network

Abstract Obstacle avoidance based on a monocular camera is a fundamental yet highly challenging task due to the lack of 3D information for a monocular quadrotor. Recent methods based on convolutional neural networks (CNNs) [1] for monocular depth estimation and obstacle detection become increasingly popular due to the considerable advances in deep learning. However, depth estimation by pre-trained CNNs usually suffers from large accuracy degradation for scenes of different types from the training data which are common for obstacle avoidance of drones in unknown environments. In this paper, we present a reactive obstacle avoidance system which employs an online adaptive CNN for progressively improving depth estimation from a monocular camera in unfamiliar environments. Pairs of motion stereo images are collected on-the-fly as training data based on a direct monocular SLAM running in parallel with the CNN. Novel approaches are introduced for selecting highly reliable training samples from noisy data provided by SLAM and efficient online CNN tuning. The depth map computed from the CNN is transformed into Ego Dynamic Space (EDS) by embedding both dynamic motion constraints of a quadrotor and depth estimation errors into the spatial depth map. Traversable waypoints with consideration of the camera’s field of view constraint are automatically computed in EDS based on which appropriate control inputs for the quadcopter are produced. Experimental results on both public datasets, simulated environments and unseen cluttered indoor environments demonstrate the effectiveness of our system.

[1]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xin Yang,et al.  Real-time Monocular Dense Mapping for Augmented Reality , 2017, ACM Multimedia.

[3]  K. Madhava Krishna,et al.  Autonomous navigation of generic monocular quadcopter in natural environment , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Shunli Zhang,et al.  Monocular depth estimation with guidance of surface normal map , 2017, Neurocomputing.

[7]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[8]  Nicholas Roy,et al.  Multi-level mapping: Real-time dense monocular SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[11]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[12]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Nicolas Petit,et al.  The Navigation and Control technology inside the AR.Drone micro UAV , 2011 .

[14]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[15]  Oussama Khatib,et al.  Reactive collision avoidance for navigation with dynamic constraints , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[17]  Ian D. Reid,et al.  Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Francisco Angel Moreno,et al.  The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario , 2014, Int. J. Robotics Res..

[19]  Brett Browning,et al.  Evaluating Pose Estimation Methods for Stereo Visual Odometry on Robots , 2010 .

[20]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[21]  Ashutosh Saxena,et al.  Low-power parallel algorithms for single image based obstacle avoidance in aerial robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[23]  Javier Civera,et al.  DPPTAM: Dense piecewise planar tracking and mapping from a monocular sequence , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Sunando Sengupta,et al.  Semantic octree: Unifying recognition, reconstruction and representation via an octree constrained higher order MRF , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Hu Tian,et al.  Depth estimation with convolutional conditional random field network , 2016, Neurocomputing.

[28]  Michael Gassner,et al.  SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems , 2017, IEEE Transactions on Robotics.

[29]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[30]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[31]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).