A 3D Grid Mapping System Based on Depth Prediction from a Monocular Camera

In complex unknown 3D environments, an accurate 3D volumetric representation of the environment is important for an intelligent robot. Simultaneous Localization and Mapping (SLAM) is considered as a fundamental direction in this area. RGB-D information is very important in traditional SLAM methods. The depth information obtained by sensors like some RGB-D cameras has limits in precision and accuracy. High-precision sensors like lasers and radars are often very expensive. Efficient algorithms should be adopted into those traditional SLAM system. They can not only improve the system efficiency in poorly-equipped conditions but also reduce the resources consumption of robots. To tackle the trade-off between performance and cost, this paper proposes a system producing a 3D grid map that can be used for navigation with a monocular camera and IMU of small size. Our system uses a deep neural network to predict the depth information of a monocular image and utilize the dynamic frame hopping strategy to make a smoother prediction result. Furthermore, we complete a 3D grid map directly used for navigation. The whole grid mapping process occupies little computation and storage resource at the same time. We adopt the octree structure and a keyframe method in the process of the 3D grid mapping to reduce resource consumption. Experiments in a real-world environment show that our approach achieves good results in depth prediction and can well update the 3D grid map for navigation.

[1]  Andreas Zell,et al.  Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles , 2017, Robotics Auton. Syst..

[2]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[3]  Wenjing Yang,et al.  Real-time globally consistent 3D grid mapping , 2017, 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[4]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[5]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[6]  Takeo Kanade,et al.  Terrain mapping for a roving planetary explorer , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[7]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[8]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[9]  Hans P. Moravec Robot spatial perception by stereoscopic vision and 3D evidence grids , 1996 .

[10]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[12]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[13]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[14]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[16]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[17]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[18]  Ramesh C. Jain,et al.  Building an environment model using depth information , 1989, Computer.

[19]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[20]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Martial Hebert,et al.  Accurate rough terrain estimation with space-carving kernels , 2009, Robotics: Science and Systems.

[22]  Jun Li,et al.  A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Qi Wei,et al.  DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[25]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[27]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[28]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Shaowu Yang,et al.  An Efficient Spatial Representation for Path Planning of Ground Robots in 3D Environments , 2018, IEEE Access.