Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery

Self-supervised learning of egomotion and depth has recently attracted great attentions. These learning models can provide pose and depth maps to support navigation and perception task for autonomous driving and robots, while they do not require high-precision ground-truth labels to train the networks. However, monocular vision based methods suffer from pose scale-ambiguity problem, so that can not generate physical meaningful trajectory, and thus their applications are limited in real-world. We propose a novel self-learning deep neural network framework that can learn to estimate egomotion and depths with absolute metric scale from monocular images. Coarse depth scale is recovered via comparing point cloud data against a pretrained model that ensures the consistency of photometric loss. The scale-ambiguity problem is solved by introducing a novel two-stages coarse-to-fine scale recovery strategy that jointly refines coarse poses and depths. Our model successfully produces pose and depth estimates in global scale-metric, even in low-light condition, i.e. driving at night. The evaluation on the public datasets demonstrates that our model outperforms both representative traditional and learning based VOs and VIOs, e.g. VINS-mono, ORB-SLAM, SC-Learner, and UnVIO.

[1]  Huchuan Lu,et al.  Can Scale-Consistent Monocular Depth Be Learned in a Self-Supervised Scale-Invariant Manner? , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Yangang Cai,et al.  Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Kyle Lindgren,et al.  Unsupervised Deep Visual-Inertial Odometry with Online Error Correction for RGB-D Imagery , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Hong Liu,et al.  Unsupervised Monocular Visual-inertial Odometry Network , 2020, IJCAI.

[5]  Chun-Yi Lee,et al.  Dynamic Attention-based Visual Odometry , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Chris Xiaoxuan Lu,et al.  Learning Selective Sensor Fusion for State Estimation. , 2019, IEEE transactions on neural networks and learning systems.

[7]  Alp Eren Sari,et al.  SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation , 2019, Neural Networks.

[8]  Chunhua Shen,et al.  Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video , 2019, NeurIPS.

[9]  Stefano Soatto,et al.  Unsupervised Depth Completion From Visual Inertial Odometry , 2019, IEEE Robotics and Automation Letters.

[10]  Adrien Gaidon,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hongbin Zha,et al.  Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Wei Wu,et al.  Selective Sensor Fusion for Neural Visual-Inertial Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Lei Zhou,et al.  Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[15]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Tom White,et al.  Generative Adversarial Networks: An Overview , 2017, IEEE Signal Processing Magazine.

[17]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[18]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Sen Wang,et al.  VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem , 2017, AAAI.

[20]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Frank Dellaert,et al.  On-Manifold Preintegration for Real-Time Visual--Inertial Odometry , 2015, IEEE Transactions on Robotics.

[24]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[27]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[28]  Andreas Geiger,et al.  Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[29]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[30]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[31]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[32]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.