Semi-supervised Depth Estimation from Sparse Depth and a Single Image for Dense Map Construction

For robust navigation, the objective in visual SLAM is to create a dense map from a sparse input. Although there have been a significant number of endeavors on real-time mapping, the existing works for visual SLAM systems still fail to preserve adequate geometry details that are important for navigation. This paper estimates pixel-wise depth from a single image and a few depth points which are constructed from registered LiDAR or acquired from visual SLAM systems to construct a dense map. The main idea is to employ a set of new loss functions consisting of photometric reconstruction consistency (forward-backward consistency and left-right consistency), depth loss, nearby frame geometric consistency, and smoothness loss and propose a depth estimation network based on ResNet. The experimental results show that the proposed method is superior the state-of-the-art methods on both raw LiDAR scans dataset and semi-dense annotation dataset. Furthermore, the errors of the sparse depth produced by stereo ORB-SLAM2 are evaluated and this sparse depth and a single image are fed into the proposed model to further demonstrate the superiority of the proposed work.

[1]  Wolfgang Hess,et al.  Real-time loop closure in 2D LIDAR SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[3]  脇元 修一,et al.  IEEE International Conference on Robotics and Automation (ICRA) におけるフルードパワー技術の研究動向 , 2011 .

[4]  Lipu Zhou,et al.  Unsupervised Learning of Monocular Depth Estimation with Bundle Adjustment, Super-Resolution and Clip Loss , 2018, ArXiv.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[7]  Stefano Mattoccia,et al.  Generative Adversarial Networks for Unsupervised Monocular Depth Prediction , 2018, ECCV Workshops.

[8]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nicu Sebe,et al.  Unsupervised Adversarial Depth Estimation Using Cycled Generative Networks , 2018, 2018 International Conference on 3D Vision (3DV).

[11]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[12]  Ji Zhang,et al.  LOAM: Lidar Odometry and Mapping in Real-time , 2014, Robotics: Science and Systems.

[13]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[14]  Ian D. Reid,et al.  Multi-modal Auto-Encoders as Joint Estimators for Robotics Scene Understanding , 2016, Robotics: Science and Systems.

[15]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[16]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Yong Liu,et al.  Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[21]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.