Improving Learning-based Ego-motion Estimation with Homomorphism-based Losses and Drift Correction

Visual odometry is an essential problem for mobile robots. Traditional methods for solving VO mostly utilize geometric optimization. While capable of achieving high accuracy, these methods require accurate sensor calibration and complicated parameter tuning to work well in practice. With the rise of deep learning, there has been increased interest in the end-to-end, learning-based methods for VO, which have the potential to improve robustness. However, learning-based methods for VO so far are less accurate than geometric methods. We argue that one of the main issues is that the current ego-motion estimation task is different from other problems where deep learning has been successful such as object detection. We define a novel cost function for learning-based VO considering the mathematical properties of the group homomorphism. In addition to the standard L2 loss, we incorporate losses based on the identity, inverse and closure properties of SE(3) rigid motion. Furthermore, we propose to reduce the VO drift by estimating the drivable regions using semantic segmentation and incorporate this information into a pose graph optimization. Experiments on KITTI datasets show that the novel cost function can improve ego-motion estimation compared to the state-of-the-art and the drivable region-based correction further reduces the VO drift.

[1]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[2]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[6]  Sen Wang,et al.  End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks , 2018, Int. J. Robotics Res..

[7]  Gabriele Costante,et al.  LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation , 2017, IEEE Robotics and Automation Letters.

[8]  K. Madhava Krishna,et al.  Geometric Consistency for Self-Supervised End-to-End Visual Odometry , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Kurt Konolige,et al.  g 2 o: A general Framework for (Hyper) Graph Optimization , 2011 .

[10]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[11]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jian Zhang,et al.  Global Pose Estimation with an Attention-Based Recurrent Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Fabio Tozeto Ramos,et al.  Semi-parametric models for visual odometry , 2012, 2012 IEEE International Conference on Robotics and Automation.

[16]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Tucker R. Balch,et al.  Memory-based learning for visual odometry , 2008, 2008 IEEE International Conference on Robotics and Automation.

[21]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[22]  Valentin Peretroukhin,et al.  DPC-Net: Deep Pose Correction for Visual Localization , 2017, IEEE Robotics and Automation Letters.

[23]  Clark C. Guest,et al.  High Accuracy Monocular SFM and Scale Correction for Autonomous Driving , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  John J. Leonard,et al.  Towards visual ego-motion learning in robots , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).