Large-scale, real-time visual–inertial localization revisited

The overarching goals in image-based localization are scale, robustness, and speed. In recent years, approaches based on local features and sparse 3D point-cloud models have both dominated the benchmarks and seen successful real-world deployment. They enable applications ranging from robot navigation, autonomous driving, virtual and augmented reality to device geo-localization. Recently, end-to-end learned localization approaches have been proposed which show promising results on small-scale datasets. However, the positioning accuracy, scalability, latency, and compute and storage requirements of these approaches remain open challenges. We aim to deploy localization at a global scale where one thus relies on methods using local features and sparse 3D models. Our approach spans from offline model building to real-time client-side pose fusion. The system compresses the appearance and geometry of the scene for efficient model storage and lookup leading to scalability beyond what has been demonstrated previously. It allows for low-latency localization queries and efficient fusion to be run in real-time on mobile platforms by combining server-side localization with real-time visual–inertial-based camera pose tracking. In order to further improve efficiency, we leverage a combination of priors, nearest-neighbor search, geometric match culling, and a cascaded pose candidate refinement step. This combination outperforms previous approaches when working with large-scale models and allows deployment at unprecedented scale. We demonstrate the effectiveness of our approach on a proof-of-concept system localizing 2.5 million images against models from four cities in different regions of the world achieving query latencies in the 200 ms range.

[1]  Yaser Sheikh,et al.  3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Hongdong Li,et al.  Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Torsten Sattler,et al.  3D visual perception for self-driving cars using a multi-camera system: Calibration, mapping, localization, and obstacle detection , 2017, Image Vis. Comput..

[4]  Noah Snavely,et al.  Minimal Scene Descriptions from Structure from Motion Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Torsten Sattler,et al.  Large-Scale Location Recognition and the Geometric Burstiness Problem , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Fredrik Kahl,et al.  City-Scale Localization for Cameras with Known Vertical Direction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Wolfram Burgard,et al.  Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  P. J. Narayanan,et al.  Visibility Probability Structure from SfM Datasets and Applications , 2012, ECCV.

[9]  Michael Bosse,et al.  Summary Maps for Lifelong Visual Localization , 2016, J. Field Robotics.

[10]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Viktor Larsson,et al.  Outlier Rejection for Absolute Pose Estimation with Known Orientation , 2016, BMVC.

[12]  Torsten Sattler,et al.  Toroidal Constraints for Two-Point Localization Under High Outlier Ratios , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Torsten Sattler,et al.  Scalable 6-DOF Localization on Mobile Devices , 2014, ECCV.

[14]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Dimitrios G. Kottas,et al.  Camera-IMU-based localization: Observability analysis and consistency improvement , 2014, Int. J. Robotics Res..

[17]  Stefano Soatto,et al.  Visual-inertial navigation, mapping and localization: A scalable real-time causal approach , 2011, Int. J. Robotics Res..

[18]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Luc Van Gool,et al.  An Integer Linear Programming Model for View Selection on Overlapping Camera Clusters , 2014, 2014 2nd International Conference on 3D Vision.

[20]  Gordon Wyeth,et al.  Towards persistent indoor appearance-based localization, mapping and navigation using CAT-Graph , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Frank Dellaert,et al.  Visibility learning in large-scale urban environment , 2011, 2011 IEEE International Conference on Robotics and Automation.

[22]  Wolfram Burgard,et al.  Robust Visual Robot Localization Across Seasons Using Network Flows , 2014, AAAI.

[23]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[25]  Michael Bosse,et al.  Trajectory-Based Place-Recognition for Efficient Large Scale Localization , 2017, International Journal of Computer Vision.

[26]  Guang-Zhong Yang,et al.  Feature Co-occurrence Maps: Appearance-based localisation throughout the day , 2013, 2013 IEEE International Conference on Robotics and Automation.

[27]  Victor Lempitsky,et al.  The inverted multi-index , 2012, CVPR.

[28]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[29]  Ehud Rivlin,et al.  Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[31]  Roland Siegwart,et al.  Visual place recognition with probabilistic voting , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Tobias Höllerer,et al.  Efficient Computation of Absolute Pose for Gravity-Aware Augmented Reality , 2015, 2015 IEEE International Symposium on Mixed and Augmented Reality.

[33]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[34]  Valérie Gouet-Brunet,et al.  A survey on Visual-Based Localization: On the benefit of heterogeneous data , 2018, Pattern Recognit..

[35]  Dieter Schmalstieg,et al.  Discriminative Feature-to-Point Matching in Image-Based Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Dieter Schmalstieg,et al.  Global Localization from Monocular SLAM on a Mobile Phone , 2014, IEEE Transactions on Visualization and Computer Graphics.

[38]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Stewart Worrall,et al.  Identifying robust landmarks in feature-based maps , 2018, 2019 IEEE Intelligent Vehicles Symposium (IV).

[40]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[43]  Stergios I. Roumeliotis,et al.  C-KLAM: Constrained keyframe-based localization and mapping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[45]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[46]  Erik Maehle,et al.  A comparison of feature descriptors for visual SLAM , 2013, 2013 European Conference on Mobile Robots.

[47]  Michael Bosse,et al.  Keep it brief: Scalable creation of compressed localization maps , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48]  Joel A. Hesch,et al.  Consistent map-based 3D localization on mobile devices , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Joel A. Hesch,et al.  Consistency analysis and improvement for single-camera localization , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[50]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Torsten Sattler,et al.  Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[53]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Nicholas Roy,et al.  Towards Persistent Localization and Mapping with a Continuous Appearance-Based Topology , 2013 .

[56]  Stergios I. Roumeliotis,et al.  Vision-Aided Inertial Navigation for Spacecraft Entry, Descent, and Landing , 2009, IEEE Transactions on Robotics.

[57]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Liang Wang,et al.  A Dataset for Benchmarking Image-Based Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Anastasios I. Mourikis,et al.  Motion tracking with fixed-lag smoothing: Algorithm and consistency analysis , 2011, 2011 IEEE International Conference on Robotics and Automation.

[60]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Jörg Stückler,et al.  Keyframe-based visual-inertial online SLAM with relocalization , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[63]  Ian D. Reid,et al.  Real-Time SLAM Relocalisation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[64]  Robert M. Haralick,et al.  Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.

[65]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2015, CVPR.

[66]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[70]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[71]  Roland Siegwart,et al.  Keyframe-Based Visual-Inertial SLAM using Nonlinear Optimization , 2013, Robotics: Science and Systems.

[72]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[74]  Dieter Schmalstieg,et al.  Wide area localization on mobile phones , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[75]  BosseMichael,et al.  Keypoint design and evaluation for place recognition in 2D lidar maps , 2009 .

[76]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Romuald Aufrère,et al.  Making visual SLAM consistent with geo-referenced landmarks , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[78]  Eckehard G. Steinbach,et al.  Efficient Map Compression for Collaborative Visual SLAM , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[79]  Roland Siegwart,et al.  Maplab: An Open Framework for Research in Visual-Inertial Mapping and Localization , 2017, IEEE Robotics and Automation Letters.

[80]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Joel A. Hesch,et al.  Large-scale cooperative 3D visual-inertial mapping in a Manhattan world , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[82]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[83]  BosseMichael,et al.  Summary Maps for Lifelong Visual Localization , 2016 .

[84]  Torsten Sattler,et al.  Hybrid Scene Compression for Visual Localization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[86]  Gabe Sibley,et al.  Sliding window filter with application to planetary landing , 2010 .

[87]  Torsten Sattler,et al.  Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[89]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[90]  N. Trawny,et al.  Indirect Kalman Filter for 3 D Attitude Estimation , 2005 .

[91]  Marc Pollefeys,et al.  Disambiguating visual relations using loop constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[92]  Michael Bosse,et al.  Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization , 2015, Robotics: Science and Systems.

[93]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[94]  Michael Bosse,et al.  The gist of maps - summarizing experience for lifelong localization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[95]  David W. Murray,et al.  Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[96]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[97]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[98]  John J. Leonard,et al.  Temporally scalable visual SLAM using a reduced pose graph , 2013, 2013 IEEE International Conference on Robotics and Automation.

[99]  Roland Siegwart,et al.  Efficient descriptor learning for large scale localization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[100]  Fredrik Kahl,et al.  Accurate Localization and Pose Estimation for Large 3D Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[101]  Jiri Matas,et al.  Randomized RANSAC with Td, d test , 2004, Image Vis. Comput..

[102]  David Martin,et al.  Street View Motion-from-Structure-from-Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[103]  Ngai-Man Cheung,et al.  On-Device Scalable Image-Based Localization via Prioritized Cascade Search and Fast One-Many RANSAC , 2018, IEEE Transactions on Image Processing.

[104]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[105]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[106]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[107]  Roland Siegwart,et al.  A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[108]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[109]  Konrad Schindler,et al.  Optimal Reduction of Large Image Databases for Location Recognition , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[110]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  Michael Bosse,et al.  Keypoint design and evaluation for place recognition in 2D lidar maps , 2009, Robotics Auton. Syst..

[112]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[113]  Selim Benhimane,et al.  Inertial sensor-aligned visual feature descriptors , 2011, CVPR 2011.

[114]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[115]  Simon Lacroix,et al.  Probabilistic place recognition with covisibility maps , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[116]  Stefano Soatto,et al.  Robust inference for visual-inertial sensor fusion , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[117]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[118]  H. Jin Kim,et al.  Real-time monocular image-based 6-DoF localization , 2015, Int. J. Robotics Res..