Semantic Match Consistency for Long-Term Visual Localization

Robust and accurate visual localization across large appearance variations due to changes in time of day, seasons, or changes of the environment is a challenging problem which is of importance to application areas such as navigation of autonomous robots. Traditional feature-based methods often struggle in these conditions due to the significant number of erroneous matches between the image and the 3D model. In this paper, we present a method for scoring the individual correspondences by exploiting semantic information about the query image and the scene. In this way, erroneous correspondences tend to get a low semantic consistency score, whereas correct correspondences tend to get a high score. By incorporating this information in a standard localization pipeline, we show that the localization performance can be significantly improved compared to the state-of-the-art, as evaluated on two challenging long-term localization benchmarks.

[1]  Roland Siegwart,et al.  A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[2]  Markus Schreiber,et al.  LaneLoc: Lane marking based localization using highly accurate maps , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[3]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[4]  Hongdong Li,et al.  Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  George J. Pappas,et al.  Localization from semantic observations via the matrix permanent , 2016, Int. J. Robotics Res..

[6]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[7]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[8]  Torsten Sattler,et al.  Large-Scale Location Recognition and the Geometric Burstiness Problem , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew Zisserman,et al.  Visual Vocabulary with a Semantic Twist , 2014, ACCV.

[10]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Eric Brachmann,et al.  Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2015, CVPR.

[13]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[14]  Torsten Sattler,et al.  Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jianxiong Xiao,et al.  Semantic alignment of LiDAR data at city scale , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Takeo Kanade,et al.  Visual topometric localization , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[20]  Jan-Michael Frahm,et al.  Indoor-Outdoor 3D Reconstruction Alignment , 2016, ECCV.

[21]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[24]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[26]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luc Van Gool,et al.  Matching Features Correctly through Semantic Understanding , 2014, 2014 2nd International Conference on 3D Vision.

[28]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Carl Olsson,et al.  Long-Term 3D Localization and Pose from Semantic Labellings , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[30]  Zuzana Kukelova,et al.  Closed-Form Solutions to Minimal Absolute Pose Problems with Known Vertical Direction , 2010, ACCV.

[31]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[33]  Fredrik Kahl,et al.  City-Scale Localization for Cameras with Known Vertical Direction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[35]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Mubarak Shah,et al.  GIS-Assisted Object Detection and Geospatial Localization , 2014, ECCV.

[40]  P. J. Narayanan,et al.  Visibility Probability Structure from SfM Datasets and Applications , 2012, ECCV.

[41]  Torsten Sattler,et al.  Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Torsten Sattler,et al.  Toroidal Constraints for Two-Point Localization Under High Outlier Ratios , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Torsten Sattler,et al.  Merging the Unmatchable: Stitching Visually Disconnected SfM Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jana Kosecka,et al.  Semantically Guided Geo-location and Modeling in Urban Environments , 2016, Large-Scale Visual Geo-Localization.

[47]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[48]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Robert M. Haralick,et al.  Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.

[51]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[52]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[53]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[54]  Roberto Cipolla,et al.  Research data supporting “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization”: St Marys Church , 2015 .

[55]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Michael Bosse,et al.  Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization , 2015, Robotics: Science and Systems.

[57]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[59]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[60]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).