论文信息 - TartanVO: A Generalizable Learning-based VO

TartanVO: A Generalizable Learning-based VO

We present the first learning-based visual odometry (VO) model, which generalizes to multiple datasets and real-world scenarios and outperforms geometry-based methods in challenging scenes. We achieve this by leveraging the SLAM dataset TartanAir, which provides a large amount of diverse synthetic data in challenging environments. Furthermore, to make our VO model generalize across datasets, we propose an up-to-scale loss function and incorporate the camera intrinsic parameters into the model. Experiments show that a single model, TartanVO, trained only on synthetic data, without any finetuning, can be generalized to real-world datasets such as KITTI and EuRoC, demonstrating significant advantages over the geometry-based methods on challenging trajectories. Our code is available at this https URL.

[1] Sen Wang,et al. DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2] J. M. M. Montiel,et al. ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[3] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4] Thomas Brox,et al. DeepTAM: Deep Tracking and Mapping , 2018, ECCV.

[5] Simon Lucey,et al. Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Zhichao Yin,et al. GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Cordelia Schmid,et al. SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[9] José Ruíz Ascencio,et al. Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[10] Shichao Yang,et al. Improving Learning-based Ego-motion Estimation with Homomorphism-based Losses and Drift Correction , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Sen Wang,et al. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks , 2018, Int. J. Robotics Res..

[13] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[14] Dongbing Gu,et al. UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15] Nan Yang,et al. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Hongbin Zha,et al. Self-Supervised Deep Visual Odometry With Online Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Stefan Leutenegger,et al. LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo , 2018, ECCV.

[18] Paolo Valigi,et al. Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[19] Elie A. Shammas,et al. Keyframe-based monocular SLAM: design, survey, and future directions , 2016, Robotics Auton. Syst..

[20] Varun Jampani,et al. Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21] David Nistér,et al. An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Tucker R. Balch,et al. Memory-based learning for visual odometry , 2008, 2008 IEEE International Conference on Robotics and Automation.

[23] Chamara Saroj Weerasekera,et al. Visual Odometry Revisited: What Should Be Learnt? , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[24] Paolo Valigi,et al. Evaluation of non-geometric methods for visual odometry , 2014, Robotics Auton. Syst..

[25] Federico Tombari,et al. CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Daniel Cremers,et al. LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[27] Yun-Hui Liu,et al. Robust and Efficient Estimation of Absolute Camera Pose for Monocular Visual Odometry , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[28] Fabio Tozeto Ramos,et al. Semi-parametric models for visual odometry , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29] Davide Scaramuzza,et al. SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[30] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Ashish Kapoor,et al. TartanAir: A Dataset to Push the Limits of Visual SLAM , 2020, ArXiv.

[32] Ian D. Reid,et al. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Dan Xu,et al. Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34] Daniel Cremers,et al. Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Roland Siegwart,et al. The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[36] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[37] Clark C. Guest,et al. High Accuracy Monocular SFM and Scale Correction for Autonomous Driving , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Thomas Brox,et al. DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Yi Yang,et al. Supplementary Materials for UnOS: Uniﬁed Unsupervised Optical-ﬂow and Stereo-depth Estimation by Watching Videos , 2019 .

[40] Michael Gassner,et al. SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems , 2017, IEEE Transactions on Robotics.

[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Michael J. Black,et al. Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Jia-Bin Huang,et al. DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[44] Ping Tan,et al. BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[45] Anelia Angelova,et al. Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.