论文信息 - A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence

A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence

Deep learning based localization and mapping has recently attracted significant attention. Instead of creating hand-designed algorithms through exploitation of physical models or geometric theories, deep learning based solutions provide an alternative to solve the problem in a data-driven way. Benefiting from ever-increasing volumes of data and computational power, these methods are fast evolving into a new area that offers accurate and robust systems to track motion and estimate scenes and their structure for real-world applications. In this work, we provide a comprehensive survey, and propose a new taxonomy for localization and mapping using deep learning. We also discuss the limitations of current models, and indicate possible future directions. A wide range of topics are covered, from learning odometry estimation, mapping, to global localization and simultaneous localization and mapping (SLAM). We revisit the problem of perceiving self-motion and scene understanding with on-board sensors, and show how to solve it by integrating these modules into a prospective spatial machine intelligence system (SMIS). It is our hope that this work can connect emerging works from robotics, computer vision and machine learning communities, and serve as a guide for future researchers to apply deep learning to tackle localization and mapping problems.

[1] Chunhua Shen,et al. Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video , 2019, NeurIPS.

[2] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3] Juho Kannala,et al. Hierarchical Scene Coordinate Classification and Regression for Visual Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Sen Wang,et al. VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem , 2017, AAAI.

[5] Ruigang Yang,et al. DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Juho Kannala,et al. Full-Frame Scene Coordinate Regression for Image-Based Localization , 2018, Robotics: Science and Systems.

[7] Matthias Nießner,et al. Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] H. C. Longuet-Higgins,et al. A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[9] Robert Harle,et al. A Survey of Indoor Inertial Positioning Systems for Pedestrians , 2013, IEEE Communications Surveys & Tutorials.

[10] Torsten Sattler,et al. Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Zhichao Yin,et al. GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[13] Olivier Stasse,et al. MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Gabriela Csurka,et al. Visual Localization by Learning Objects-Of-Interest Dense Match Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jan Kautz,et al. Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Xin Li,et al. LO-Net: Deep Real-Time Lidar Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Chunxia Xiao,et al. PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Iasonas Kokkinos,et al. Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Andrew J. Davison,et al. FutureMapping: The Computational Structure of Spatial AI Systems , 2018, ArXiv.

[20] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Arnaud de La Fortelle,et al. Deep Sensor Fusion for Real-Time Odometry Estimation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22] Long Quan,et al. ASLFeat: Learning Local Features of Accurate Shape and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Pascal Fua,et al. Beyond Cartesian Representations for Local Descriptors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Dragomir Anguelov,et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Lu Fang,et al. SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Paolo Valigi,et al. Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[28] Marcelo H. Ang,et al. 2D3D-Matchnet: Learning To Match Keypoints Across 2D Image And 3D Point Cloud , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[29] Michael Milford,et al. Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[31] Roland Siegwart,et al. Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery , 2019, IEEE Robotics and Automation Letters.

[32] Jitendra Malik,et al. Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[33] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[34] Andrew W. Fitzgibbon,et al. Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Shaojie Shen,et al. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[36] Ian D. Reid,et al. A Hybrid Probabilistic Model for Camera Relocalization , 2018, BMVC.

[37] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[38] Vincent Lepetit,et al. BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[39] Qijun Chen,et al. Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40] Pieter Abbeel,et al. Geometry-Aware Neural Rendering , 2019, NeurIPS.

[41] Juho Kannala,et al. Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[42] Xiaqing Ding,et al. LocNet: Global Localization in 3D Point Clouds for Mobile Vehicles , 2017, 2018 IEEE Intelligent Vehicles Symposium (IV).

[43] Bohyung Han,et al. Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[44] Wei Zhang,et al. Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[45] Keyu Wu,et al. AbolDeepIO: A Novel Deep Inertial Odometry Network for Autonomous Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.

[46] C. V. Jawahar,et al. Improved Visual Relocalization by Discovering Anchor Points , 2018, BMVC.

[47] Torsten Sattler,et al. To Learn or Not to Learn: Visual Localization from Essential Matrices , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[48] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .

[49] Sen Wang,et al. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks , 2018, Int. J. Robotics Res..

[50] John J. Leonard,et al. Real-time large-scale dense RGB-D SLAM with volumetric fusion , 2014, Int. J. Robotics Res..

[51] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[52] Yang Li,et al. Pose Graph optimization for Unsupervised Monocular Visual Odometry , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[53] Nassir Navab,et al. Scene Coordinate and Correspondence Learning for Image-Based Localization , 2018, BMVC.

[54] Agathoniki Trigoni,et al. IONet: Learning to Cure the Curse of Drift in Inertial Odometry , 2018, AAAI.

[55] Andrew Markham,et al. See through smoke: robust indoor mapping with low-cost mmWave radar , 2020, MobiSys.

[56] Ruigang Yang,et al. The ApolloScape Open Dataset for Autonomous Driving and Its Application , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Sampath Rangarajan,et al. TrackIO: Tracking First Responders Inside-Out , 2019, NSDI.

[58] Roland Siegwart,et al. BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[59] Ian D. Reid,et al. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60] Juho Kannala,et al. Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization , 2018, ECCV Workshops.

[61] Roberto Cipolla,et al. Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[62] Masatoshi Okutomi,et al. 24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63] Yasin Almalioglu,et al. Distilling Knowledge From a Deep Pose Regressor Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64] Ping Tan,et al. SANet: Scene Agnostic Network for Camera Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65] Tomoya Ishikawa,et al. PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[66] Arno Solin,et al. DEEP LEARNING BASED SPEED ESTIMATION FOR CONSTRAINING STRAPDOWN INERTIAL NAVIGATION ON SMARTPHONES , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[67] V. Lepetit,et al. EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[68] Ian D. Reid,et al. Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69] Thomas Brox,et al. DeepTAM: Deep Tracking and Mapping with Convolutional Neural Networks , 2019, International Journal of Computer Vision.

[70] Roland Siegwart,et al. Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization , 2018, CoRL.

[71] Marc Pollefeys,et al. Matching neural paths: transfer from recognition to correspondence search , 2017, NIPS.

[72] Torsten Sattler,et al. DGC-Net: Dense Geometric Correspondence Network , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[73] Wolfram Burgard,et al. The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..

[74] Nan Yang,et al. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Jianping Shi,et al. CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76] Kyle Lindgren,et al. Unsupervised Deep Visual-Inertial Odometry with Online Error Correction for RGB-D Imagery , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77] Hujun Bao,et al. Prior Guided Dropout for Robust Visual Localization in Dynamic Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[78] Xiaolin Hu,et al. Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[79] Jianliang Tang,et al. Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[80] Torsten Sattler,et al. Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[81] Dan Xu,et al. Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[82] Yasin Almalioglu,et al. DeepTIO: A Deep Thermal-Inertial Odometry With Visual Hallucination , 2020, IEEE Robotics and Automation Letters.

[83] Oliver Brock,et al. Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors , 2018, Robotics: Science and Systems.

[84] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[85] Davide Scaramuzza,et al. EVO: A Geometric Approach to Event-Based 6-DOF Parallel Tracking and Mapping in Real Time , 2017, IEEE Robotics and Automation Letters.

[86] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87] Torsten Sattler,et al. Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88] Wolfram Burgard,et al. VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[89] Eric Brachmann,et al. Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[90] Peter I. Corke,et al. Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[91] Bo Yang,et al. DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[92] Roberto Cipolla,et al. Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[93] Torsten Sattler,et al. Evaluating Local Features for Day-Night Matching , 2016, ECCV Workshops.

[94] Marc Pollefeys,et al. From Point Clouds to Mesh Using Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[95] Roland Siegwart,et al. From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96] Anastasios I. Mourikis,et al. High-precision, consistent EKF-based visual-inertial odometry , 2013, Int. J. Robotics Res..

[97] Friedrich Fraundorfer,et al. Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[98] Torsten Sattler,et al. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[99] Luc Van Gool,et al. RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[100] Daniel Cremers,et al. Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[101] Andrew Zisserman,et al. DisLocation: Scalable Descriptor Distinctiveness for Location Recognition , 2014, ACCV.

[102] Tomasz Malisiewicz,et al. SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[103] Davide Scaramuzza,et al. SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[104] Luigi di Stefano,et al. On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[105] Esa Rahtu,et al. Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[106] Wolfram Burgard,et al. Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[107] Michael Bosse,et al. Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[108] Torsten Sattler,et al. Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[109] Raia Hadsell,et al. Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[110] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[111] Adam Herout,et al. CNN for IMU assisted odometry estimation using velodyne LiDAR , 2017, 2018 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC).

[112] Wolfram Burgard,et al. Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[113] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[114] Wei Wu,et al. Selective Sensor Fusion for Neural Visual-Inertial Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[115] U SaputraMuhamad Risqi,et al. Visual SLAM and Structure from Motion in Dynamic Environments , 2018 .

[116] Yue Wang,et al. Deep Closest Point: Learning Representations for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[117] Guoquan Huang,et al. Lightweight Unsupervised Deep Loop Closure , 2018, Robotics: Science and Systems.

[118] Lei Yang,et al. Bringing IoT to Sports Analytics , 2017, NSDI.

[119] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[120] Shuda Li,et al. RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[121] Yasin Almalioglu,et al. GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[122] Roland Memisevic,et al. Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[123] Daniel Cremers,et al. Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[124] Stefan Leutenegger,et al. CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[125] Weisi Lin,et al. Cascaded Parallel Filtering for Memory-Efficient Image-Based Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[126] Neil D. Lawrence,et al. WiFi-SLAM Using Gaussian Process Latent Variable Models , 2007, IJCAI.

[127] Wolfram Burgard,et al. A Tutorial on Graph-Based SLAM , 2010, IEEE Intelligent Transportation Systems Magazine.

[128] Long Quan,et al. D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[129] Shiyu Song,et al. DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[130] Christopher Zach,et al. Synthetic View Generation for Absolute Pose Regression and Image Synthesis , 2018, BMVC.

[131] Henrik Karstoft,et al. UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor , 2019, ArXiv.

[132] Anelia Angelova,et al. Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[133] Tomás Pajdla,et al. Neighbourhood Consensus Networks , 2018, NeurIPS.

[134] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[135] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[136] Björn Stenger,et al. 3D Scene Mesh from CNN Depth Predictions and Sparse Monocular SLAM , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[137] Daniel Cremers,et al. Direct Sparse Visual-Inertial Odometry Using Dynamic Marginalization , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[138] Torsten Sattler,et al. Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[139] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[140] Frank Dellaert,et al. On-Manifold Preintegration for Real-Time Visual--Inertial Odometry , 2015, IEEE Transactions on Robotics.

[141] Stefan Leutenegger,et al. SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[142] Hesheng Wang,et al. Loop closure detection using supervised and unsupervised deep neural networks for monocular SLAM systems , 2020, Robotics Auton. Syst..

[143] Gabriela Csurka,et al. R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[144] Andrew Zisserman,et al. Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[145] Torsten Sattler,et al. Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[146] Sen Wang,et al. VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[147] Niko Sünderhauf,et al. On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[148] Xin Chen,et al. City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[149] Roberto Cipolla,et al. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning , 2017, IJCAI.

[150] Paul Newman,et al. 1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[151] Hongbin Zha,et al. Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[152] Horst Bischof,et al. OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[153] Changchang Wu,et al. Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[154] Carsten Rother,et al. Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[155] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[156] Andrew J. Davison,et al. DeepFactors: Real-Time Probabilistic Dense Monocular SLAM , 2020, IEEE Robotics and Automation Letters.

[157] Mathieu Aubry,et al. A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[158] Hongbin Zha,et al. Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[159] Daniel Cremers,et al. LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[160] Shiguo Lian,et al. DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[161] Syamsiah Mashohor,et al. CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[162] Gim Hee Lee,et al. PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[163] Eric Brachmann,et al. Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[164] Sergey Levine,et al. Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[165] Raquel Urtasun,et al. Learning to Localize Using a LiDAR Intensity Map , 2018, CoRL.

[166] Jan-Michael Frahm,et al. Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[167] Pascal Fua,et al. LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[168] Andrew Markham,et al. AtLoc: Attention Guided Camera Localization , 2020, AAAI.

[169] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[170] Li Sun,et al. Learning Monocular Visual Odometry with Dense 3D Mapping from Dense 3D Flow , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[171] Sen Wang,et al. DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[172] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[173] Eric Brachmann,et al. Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC , 2020, ArXiv.

[174] Eric Brachmann,et al. Expert Sample Consensus Applied to Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[175] J. M. M. Montiel,et al. ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[176] John J. Leonard,et al. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[177] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[178] Sen Wang,et al. Learning Monocular Visual Odometry through Geometry-Aware Curriculum Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[179] Hugh Durrant-Whyte,et al. Simultaneous localization and mapping (SLAM): part II , 2006 .

[180] Chamara Saroj Weerasekera,et al. Visual Odometry Revisited: What Should Be Learnt? , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[181] Long Quan,et al. KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[182] Torsten Sattler,et al. Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[183] Federico Tombari,et al. CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[184] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[185] Pascal Fua,et al. Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[186] Daniel P. Huttenlocher,et al. Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[187] Tao Zhang,et al. Unsupervised learning to detect loops using deep neural networks for visual SLAM system , 2017, Auton. Robots.

[188] Yiannis Demiris,et al. D2D: Keypoint Extraction with Describe to Detect Approach , 2020, ACCV.

[189] Matthias Nießner,et al. Scan2Mesh: From Unstructured Range Scans to 3D Meshes , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[190] Thomas Brox,et al. DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[191] Gaurav S. Sukhatme,et al. Estimating Metric Scale Visual Odometry from Videos using 3D Convolutional Networks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[192] Andrew Markham,et al. CARACAL: a versatile passive acoustic monitoring tool for wildlife research and conservation , 2019, Bioacoustics.

[193] Marc Pollefeys,et al. Privacy Preserving Image-Based Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[194] Andrea Vedaldi,et al. Supervising the New with the Old: Learning SFM from SFM , 2018, ECCV.

[195] Stefan Leutenegger,et al. Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[196] Qi Shan,et al. RIDI: Robust IMU Double Integration , 2017, ECCV.

[197] Stefan Leutenegger,et al. LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo , 2018, ECCV.

[198] Michael Milford,et al. Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[199] Krystian Mikolajczyk,et al. Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[200] Eric Brachmann,et al. DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[201] Quanshi Zhang,et al. Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[202] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[203] Yang Li,et al. Simultaneous Localization and Mapping with Power Network Electromagnetic Field , 2018, MobiCom.

[204] Ping Tan,et al. BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[205] Duc Thanh Nguyen,et al. LCD: Learned Cross-Domain Descriptors for 2D-3D Matching , 2019, AAAI.

[206] Krystian Mikolajczyk,et al. Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[207] Jörg Stückler,et al. Multi-view deep learning for consistent semantic mapping with RGB-D cameras , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[208] Hongbin Zha,et al. Local Supports Global: Deep Camera Relocalization With Sequence Enhancement , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[209] Christopher R Fetsch,et al. Dynamic Reweighting of Visual and Vestibular Cues during Self-Motion Perception , 2009, The Journal of Neuroscience.

[210] Shadi Albarqouni,et al. Adversarial Networks for Camera Pose Regression and Refinement , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[211] Anelia Angelova,et al. Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos , 2018, AAAI.

[212] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[213] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[214] Koray Kavukcuoglu,et al. Neural scene representation and rendering , 2018, Science.

[215] Kathleen E. Cullen,et al. The vestibular system: multimodal integration and encoding of self-motion for motor control , 2012, Trends in Neurosciences.

[216] Daniel Cremers,et al. Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[217] Ji Zhang,et al. LOAM: Lidar Odometry and Mapping in Real-time , 2014, Robotics: Science and Systems.

[218] Jonathan Kelly,et al. LSTM-Based Zero-Velocity Detection for Robust Inertial Navigation , 2018, 2018 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[219] Sungjin Ahn,et al. Neural Multisensory Scene Inference , 2019, NeurIPS.

[220] Vincent Lepetit,et al. Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[221] Dongbing Gu,et al. UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[222] Dieter Fox,et al. DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks , 2017, Robotics: Science and Systems.

[223] Ingmar Posner,et al. Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[224] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[225] Roberto Cipolla,et al. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[226] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[227] Torsten Sattler,et al. Hybrid Scene Compression for Visual Localization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[228] Simon Lucey,et al. Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[229] Szymon Rusinkiewicz,et al. Learning to Detect Features in Texture Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[230] Martin Brossard,et al. RINS-W: Robust Inertial Navigation System on Wheels , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[231] Torsten Sattler,et al. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[232] Thomas Brox,et al. Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[233] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[234] Torsten Sattler,et al. D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[235] Andrew J. Davison,et al. DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[236] Agathoniki Trigoni,et al. MotionTransformer: Transferring Neural Inertial Tracking between Domains , 2019, AAAI.

[237] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[238] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[239] Eric Brachmann,et al. Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[240] Silvio Savarese,et al. Universal Correspondence Network , 2016, NIPS.

[241] Jörg Stückler,et al. Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry , 2018, ECCV.

[242] Paul Vernaza,et al. Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences , 2018, ECCV.

[243] Michael Milford,et al. Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[244] Stefan Leutenegger,et al. Learning Meshes for Dense Visual SLAM , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[245] Hugh F. Durrant-Whyte,et al. Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[246] Andrew Markham,et al. Deep Neural Network Based Inertial Odometry Using Low-Cost Inertial Measurement Units , 2021, IEEE Transactions on Mobile Computing.

[247] Julius Ziegler,et al. StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[248] James R. Bergen,et al. Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[249] Ce Liu,et al. Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[250] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[251] Shenhua Hou,et al. L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[252] Ming Cai,et al. Camera Relocalization by Exploiting Multi-View Constraints for Scene Coordinates Regression , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).