A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence

Deep learning based localization and mapping has recently attracted significant attention. Instead of creating hand-designed algorithms through exploitation of physical models or geometric theories, deep learning based solutions provide an alternative to solve the problem in a data-driven way. Benefiting from ever-increasing volumes of data and computational power, these methods are fast evolving into a new area that offers accurate and robust systems to track motion and estimate scenes and their structure for real-world applications. In this work, we provide a comprehensive survey, and propose a new taxonomy for localization and mapping using deep learning. We also discuss the limitations of current models, and indicate possible future directions. A wide range of topics are covered, from learning odometry estimation, mapping, to global localization and simultaneous localization and mapping (SLAM). We revisit the problem of perceiving self-motion and scene understanding with on-board sensors, and show how to solve it by integrating these modules into a prospective spatial machine intelligence system (SMIS). It is our hope that this work can connect emerging works from robotics, computer vision and machine learning communities, and serve as a guide for future researchers to apply deep learning to tackle localization and mapping problems.

[1]  Chunhua Shen,et al.  Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video , 2019, NeurIPS.

[2]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3]  Juho Kannala,et al.  Hierarchical Scene Coordinate Classification and Regression for Visual Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sen Wang,et al.  VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem , 2017, AAAI.

[5]  Ruigang Yang,et al.  DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Juho Kannala,et al.  Full-Frame Scene Coordinate Regression for Image-Based Localization , 2018, Robotics: Science and Systems.

[7]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[9]  Robert Harle,et al.  A Survey of Indoor Inertial Positioning Systems for Pedestrians , 2013, IEEE Communications Surveys & Tutorials.

[10]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[13]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Gabriela Csurka,et al.  Visual Localization by Learning Objects-Of-Interest Dense Match Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Xin Li,et al.  LO-Net: Deep Real-Time Lidar Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Chunxia Xiao,et al.  PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Andrew J. Davison,et al.  FutureMapping: The Computational Structure of Spatial AI Systems , 2018, ArXiv.

[20]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Arnaud de La Fortelle,et al.  Deep Sensor Fusion for Real-Time Odometry Estimation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Long Quan,et al.  ASLFeat: Learning Local Features of Accurate Shape and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Pascal Fua,et al.  Beyond Cartesian Representations for Local Descriptors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[28]  Marcelo H. Ang,et al.  2D3D-Matchnet: Learning To Match Keypoints Across 2D Image And 3D Point Cloud , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[29]  Michael Milford,et al.  Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[31]  Roland Siegwart,et al.  Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery , 2019, IEEE Robotics and Automation Letters.

[32]  Jitendra Malik,et al.  Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[33]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[34]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[36]  Ian D. Reid,et al.  A Hybrid Probabilistic Model for Camera Relocalization , 2018, BMVC.

[37]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[38]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[39]  Qijun Chen,et al.  Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Pieter Abbeel,et al.  Geometry-Aware Neural Rendering , 2019, NeurIPS.

[41]  Juho Kannala,et al.  Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[42]  Xiaqing Ding,et al.  LocNet: Global Localization in 3D Point Clouds for Mobile Vehicles , 2017, 2018 IEEE Intelligent Vehicles Symposium (IV).

[43]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[45]  Keyu Wu,et al.  AbolDeepIO: A Novel Deep Inertial Odometry Network for Autonomous Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.

[46]  C. V. Jawahar,et al.  Improved Visual Relocalization by Discovering Anchor Points , 2018, BMVC.

[47]  Torsten Sattler,et al.  To Learn or Not to Learn: Visual Localization from Essential Matrices , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[49]  Sen Wang,et al.  End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks , 2018, Int. J. Robotics Res..

[50]  John J. Leonard,et al.  Real-time large-scale dense RGB-D SLAM with volumetric fusion , 2014, Int. J. Robotics Res..

[51]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[52]  Yang Li,et al.  Pose Graph optimization for Unsupervised Monocular Visual Odometry , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[53]  Nassir Navab,et al.  Scene Coordinate and Correspondence Learning for Image-Based Localization , 2018, BMVC.

[54]  Agathoniki Trigoni,et al.  IONet: Learning to Cure the Curse of Drift in Inertial Odometry , 2018, AAAI.

[55]  Andrew Markham,et al.  See through smoke: robust indoor mapping with low-cost mmWave radar , 2020, MobiSys.

[56]  Ruigang Yang,et al.  The ApolloScape Open Dataset for Autonomous Driving and Its Application , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Sampath Rangarajan,et al.  TrackIO: Tracking First Responders Inside-Out , 2019, NSDI.

[58]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[59]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Juho Kannala,et al.  Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization , 2018, ECCV Workshops.

[61]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[62]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Yasin Almalioglu,et al.  Distilling Knowledge From a Deep Pose Regressor Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Ping Tan,et al.  SANet: Scene Agnostic Network for Camera Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Tomoya Ishikawa,et al.  PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[66]  Arno Solin,et al.  DEEP LEARNING BASED SPEED ESTIMATION FOR CONSTRAINING STRAPDOWN INERTIAL NAVIGATION ON SMARTPHONES , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[67]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[68]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Thomas Brox,et al.  DeepTAM: Deep Tracking and Mapping with Convolutional Neural Networks , 2019, International Journal of Computer Vision.

[70]  Roland Siegwart,et al.  Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization , 2018, CoRL.

[71]  Marc Pollefeys,et al.  Matching neural paths: transfer from recognition to correspondence search , 2017, NIPS.

[72]  Torsten Sattler,et al.  DGC-Net: Dense Geometric Correspondence Network , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[73]  Wolfram Burgard,et al.  The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..

[74]  Nan Yang,et al.  D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Jianping Shi,et al.  CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76]  Kyle Lindgren,et al.  Unsupervised Deep Visual-Inertial Odometry with Online Error Correction for RGB-D Imagery , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Hujun Bao,et al.  Prior Guided Dropout for Robust Visual Localization in Dynamic Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[78]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[79]  Jianliang Tang,et al.  Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[81]  Dan Xu,et al.  Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[82]  Yasin Almalioglu,et al.  DeepTIO: A Deep Thermal-Inertial Odometry With Visual Hallucination , 2020, IEEE Robotics and Automation Letters.

[83]  Oliver Brock,et al.  Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors , 2018, Robotics: Science and Systems.

[84]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[85]  Davide Scaramuzza,et al.  EVO: A Geometric Approach to Event-Based 6-DOF Parallel Tracking and Mapping in Real Time , 2017, IEEE Robotics and Automation Letters.

[86]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Torsten Sattler,et al.  Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[89]  Eric Brachmann,et al.  Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[90]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[91]  Bo Yang,et al.  DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[92]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[93]  Torsten Sattler,et al.  Evaluating Local Features for Day-Night Matching , 2016, ECCV Workshops.

[94]  Marc Pollefeys,et al.  From Point Clouds to Mesh Using Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[95]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Anastasios I. Mourikis,et al.  High-precision, consistent EKF-based visual-inertial odometry , 2013, Int. J. Robotics Res..

[97]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[98]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[99]  Luc Van Gool,et al.  RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[100]  Daniel Cremers,et al.  Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[101]  Andrew Zisserman,et al.  DisLocation: Scalable Descriptor Distinctiveness for Location Recognition , 2014, ACCV.

[102]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[103]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[104]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  Esa Rahtu,et al.  Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[106]  Wolfram Burgard,et al.  Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[107]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[108]  Torsten Sattler,et al.  Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[109]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[110]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[111]  Adam Herout,et al.  CNN for IMU assisted odometry estimation using velodyne LiDAR , 2017, 2018 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC).

[112]  Wolfram Burgard,et al.  Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[113]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[114]  Wei Wu,et al.  Selective Sensor Fusion for Neural Visual-Inertial Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[115]  U SaputraMuhamad Risqi,et al.  Visual SLAM and Structure from Motion in Dynamic Environments , 2018 .

[116]  Yue Wang,et al.  Deep Closest Point: Learning Representations for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[117]  Guoquan Huang,et al.  Lightweight Unsupervised Deep Loop Closure , 2018, Robotics: Science and Systems.

[118]  Lei Yang,et al.  Bringing IoT to Sports Analytics , 2017, NSDI.

[119]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[120]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[121]  Yasin Almalioglu,et al.  GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[122]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[123]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[124]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[125]  Weisi Lin,et al.  Cascaded Parallel Filtering for Memory-Efficient Image-Based Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[126]  Neil D. Lawrence,et al.  WiFi-SLAM Using Gaussian Process Latent Variable Models , 2007, IJCAI.

[127]  Wolfram Burgard,et al.  A Tutorial on Graph-Based SLAM , 2010, IEEE Intelligent Transportation Systems Magazine.

[128]  Long Quan,et al.  D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[129]  Shiyu Song,et al.  DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[130]  Christopher Zach,et al.  Synthetic View Generation for Absolute Pose Regression and Image Synthesis , 2018, BMVC.

[131]  Henrik Karstoft,et al.  UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor , 2019, ArXiv.

[132]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[133]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[134]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[135]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[136]  Björn Stenger,et al.  3D Scene Mesh from CNN Depth Predictions and Sparse Monocular SLAM , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[137]  Daniel Cremers,et al.  Direct Sparse Visual-Inertial Odometry Using Dynamic Marginalization , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[138]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[139]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[140]  Frank Dellaert,et al.  On-Manifold Preintegration for Real-Time Visual--Inertial Odometry , 2015, IEEE Transactions on Robotics.

[141]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[142]  Hesheng Wang,et al.  Loop closure detection using supervised and unsupervised deep neural networks for monocular SLAM systems , 2020, Robotics Auton. Syst..

[143]  Gabriela Csurka,et al.  R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[144]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[145]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[146]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[147]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[148]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[149]  Roberto Cipolla,et al.  Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning , 2017, IJCAI.

[150]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[151]  Hongbin Zha,et al.  Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[152]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[153]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[154]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[155]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[156]  Andrew J. Davison,et al.  DeepFactors: Real-Time Probabilistic Dense Monocular SLAM , 2020, IEEE Robotics and Automation Letters.

[157]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[158]  Hongbin Zha,et al.  Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[159]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[160]  Shiguo Lian,et al.  DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[161]  Syamsiah Mashohor,et al.  CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[162]  Gim Hee Lee,et al.  PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[163]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[164]  Sergey Levine,et al.  Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[165]  Raquel Urtasun,et al.  Learning to Localize Using a LiDAR Intensity Map , 2018, CoRL.

[166]  Jan-Michael Frahm,et al.  Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[167]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[168]  Andrew Markham,et al.  AtLoc: Attention Guided Camera Localization , 2020, AAAI.

[169]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[170]  Li Sun,et al.  Learning Monocular Visual Odometry with Dense 3D Mapping from Dense 3D Flow , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[171]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[172]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[173]  Eric Brachmann,et al.  Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC , 2020, ArXiv.

[174]  Eric Brachmann,et al.  Expert Sample Consensus Applied to Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[175]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[176]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[177]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[178]  Sen Wang,et al.  Learning Monocular Visual Odometry through Geometry-Aware Curriculum Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[179]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[180]  Chamara Saroj Weerasekera,et al.  Visual Odometry Revisited: What Should Be Learnt? , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[181]  Long Quan,et al.  KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[182]  Torsten Sattler,et al.  Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[183]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[184]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[185]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[186]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[187]  Tao Zhang,et al.  Unsupervised learning to detect loops using deep neural networks for visual SLAM system , 2017, Auton. Robots.

[188]  Yiannis Demiris,et al.  D2D: Keypoint Extraction with Describe to Detect Approach , 2020, ACCV.

[189]  Matthias Nießner,et al.  Scan2Mesh: From Unstructured Range Scans to 3D Meshes , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[190]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[191]  Gaurav S. Sukhatme,et al.  Estimating Metric Scale Visual Odometry from Videos using 3D Convolutional Networks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[192]  Andrew Markham,et al.  CARACAL: a versatile passive acoustic monitoring tool for wildlife research and conservation , 2019, Bioacoustics.

[193]  Marc Pollefeys,et al.  Privacy Preserving Image-Based Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[194]  Andrea Vedaldi,et al.  Supervising the New with the Old: Learning SFM from SFM , 2018, ECCV.

[195]  Stefan Leutenegger,et al.  Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[196]  Qi Shan,et al.  RIDI: Robust IMU Double Integration , 2017, ECCV.

[197]  Stefan Leutenegger,et al.  LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo , 2018, ECCV.

[198]  Michael Milford,et al.  Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[199]  Krystian Mikolajczyk,et al.  Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[200]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[201]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[202]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[203]  Yang Li,et al.  Simultaneous Localization and Mapping with Power Network Electromagnetic Field , 2018, MobiCom.

[204]  Ping Tan,et al.  BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[205]  Duc Thanh Nguyen,et al.  LCD: Learned Cross-Domain Descriptors for 2D-3D Matching , 2019, AAAI.

[206]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[207]  Jörg Stückler,et al.  Multi-view deep learning for consistent semantic mapping with RGB-D cameras , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[208]  Hongbin Zha,et al.  Local Supports Global: Deep Camera Relocalization With Sequence Enhancement , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[209]  Christopher R Fetsch,et al.  Dynamic Reweighting of Visual and Vestibular Cues during Self-Motion Perception , 2009, The Journal of Neuroscience.

[210]  Shadi Albarqouni,et al.  Adversarial Networks for Camera Pose Regression and Refinement , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[211]  Anelia Angelova,et al.  Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos , 2018, AAAI.

[212]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[213]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[214]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[215]  Kathleen E. Cullen,et al.  The vestibular system: multimodal integration and encoding of self-motion for motor control , 2012, Trends in Neurosciences.

[216]  Daniel Cremers,et al.  Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[217]  Ji Zhang,et al.  LOAM: Lidar Odometry and Mapping in Real-time , 2014, Robotics: Science and Systems.

[218]  Jonathan Kelly,et al.  LSTM-Based Zero-Velocity Detection for Robust Inertial Navigation , 2018, 2018 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[219]  Sungjin Ahn,et al.  Neural Multisensory Scene Inference , 2019, NeurIPS.

[220]  Vincent Lepetit,et al.  Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[221]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[222]  Dieter Fox,et al.  DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks , 2017, Robotics: Science and Systems.

[223]  Ingmar Posner,et al.  Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[224]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[225]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[226]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[227]  Torsten Sattler,et al.  Hybrid Scene Compression for Visual Localization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[228]  Simon Lucey,et al.  Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[229]  Szymon Rusinkiewicz,et al.  Learning to Detect Features in Texture Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[230]  Martin Brossard,et al.  RINS-W: Robust Inertial Navigation System on Wheels , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[231]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[232]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[233]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[234]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[235]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[236]  Agathoniki Trigoni,et al.  MotionTransformer: Transferring Neural Inertial Tracking between Domains , 2019, AAAI.

[237]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[238]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[239]  Eric Brachmann,et al.  Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[240]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[241]  Jörg Stückler,et al.  Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry , 2018, ECCV.

[242]  Paul Vernaza,et al.  Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences , 2018, ECCV.

[243]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[244]  Stefan Leutenegger,et al.  Learning Meshes for Dense Visual SLAM , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[245]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[246]  Andrew Markham,et al.  Deep Neural Network Based Inertial Odometry Using Low-Cost Inertial Measurement Units , 2021, IEEE Transactions on Mobile Computing.

[247]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[248]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[249]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[250]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[251]  Shenhua Hou,et al.  L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[252]  Ming Cai,et al.  Camera Relocalization by Exploiting Multi-View Constraints for Scene Coordinates Regression , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).