论文信息 - Semantic Match Consistency for Long-Term Visual Localization

Semantic Match Consistency for Long-Term Visual Localization

Robust and accurate visual localization across large appearance variations due to changes in time of day, seasons, or changes of the environment is a challenging problem which is of importance to application areas such as navigation of autonomous robots. Traditional feature-based methods often struggle in these conditions due to the significant number of erroneous matches between the image and the 3D model. In this paper, we present a method for scoring the individual correspondences by exploiting semantic information about the query image and the scene. In this way, erroneous correspondences tend to get a low semantic consistency score, whereas correct correspondences tend to get a high score. By incorporating this information in a standard localization pipeline, we show that the localization performance can be significantly improved compared to the state-of-the-art, as evaluated on two challenging long-term localization benchmarks.

[1] Roland Siegwart,et al. A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[2] Markus Schreiber,et al. LaneLoc: Lane marking based localization using highly accurate maps , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[3] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[4] Hongdong Li,et al. Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] George J. Pappas,et al. Localization from semantic observations via the matrix permanent , 2016, Int. J. Robotics Res..

[6] Mubarak Shah,et al. Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[7] Pascal Fua,et al. Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[8] Torsten Sattler,et al. Large-Scale Location Recognition and the Geometric Burstiness Problem , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Andrew Zisserman,et al. Visual Vocabulary with a Semantic Twist , 2014, ACCV.

[10] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Eric Brachmann,et al. Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12] Masatoshi Okutomi,et al. 24/7 Place Recognition by View Synthesis , 2015, CVPR.

[13] Paul Newman,et al. Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[14] Torsten Sattler,et al. Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Eric Brachmann,et al. Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Mubarak Shah,et al. Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Eric Brachmann,et al. DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jianxiong Xiao,et al. Semantic alignment of LiDAR data at city scale , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Takeo Kanade,et al. Visual topometric localization , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[20] Jan-Michael Frahm,et al. Indoor-Outdoor 3D Reconstruction Alignment , 2016, ECCV.

[21] Jan-Michael Frahm,et al. From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Daniel Cremers,et al. Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Xin Chen,et al. City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[24] Paul H. J. Kelly,et al. SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Paul Newman,et al. 1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[26] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Luc Van Gool,et al. Matching Features Correctly through Semantic Understanding , 2014, 2014 2nd International Conference on 3D Vision.

[28] Torsten Sattler,et al. Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Carl Olsson,et al. Long-Term 3D Localization and Pose from Semantic Labellings , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[30] Zuzana Kukelova,et al. Closed-Form Solutions to Minimal Absolute Pose Problems with Known Vertical Direction , 2010, ACCV.

[31] Torsten Sattler,et al. Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Wei Zhang,et al. Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[33] Fredrik Kahl,et al. City-Scale Localization for Cameras with Known Vertical Direction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Tomás Pajdla,et al. Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[35] Jiri Matas,et al. Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36] Richard Szeliski,et al. City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Peter Kontschieder,et al. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39] Mubarak Shah,et al. GIS-Assisted Object Detection and Geospatial Localization , 2014, ECCV.

[40] P. J. Narayanan,et al. Visibility Probability Structure from SfM Datasets and Applications , 2012, ECCV.

[41] Torsten Sattler,et al. Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42] Torsten Sattler,et al. Toroidal Constraints for Two-Point Localization Under High Outlier Ratios , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Torsten Sattler,et al. Merging the Unmatchable: Stitching Visually Disconnected SfM Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44] Luigi di Stefano,et al. On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Jana Kosecka,et al. Semantically Guided Geo-location and Modeling in Urban Environments , 2016, Large-Scale Visual Geo-Localization.

[47] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[48] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Torsten Sattler,et al. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50] Robert M. Haralick,et al. Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.

[51] Ilya Kostrikov,et al. PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[52] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[53] David W. Murray,et al. Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[54] Roberto Cipolla,et al. Research data supporting “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization”: St Marys Church , 2015 .

[55] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Michael Bosse,et al. Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization , 2015, Robotics: Science and Systems.

[57] Torsten Sattler,et al. Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[59] Daniel P. Huttenlocher,et al. Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[60] Andrew W. Fitzgibbon,et al. Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).