论文信息 - On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Benchmark datasets that measure camera pose accuracy have driven progress in visual re-localisation research. To obtain poses for thousands of images, it is common to use a reference algorithm to generate pseudo ground truth. Popular choices include Structure-from-Motion (SfM) and Simultaneous-Localisation-and-Mapping (SLAM) using additional sensors like depth cameras if available. Re-localisation benchmarks thus measure how well each method replicates the results of the reference algorithm. This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods. This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm. We thus question common beliefs in the re-localisation literature, namely that learning-based scene coordinate regression outperforms classical feature-based methods, and that RGB-Dbased methods outperform RGB-based methods. We argue that any claims on ranking re-localisation methods should take the type of the reference algorithm, and the similarity of the methods to the reference algorithm, into account.

[1] Roberto Cipolla,et al. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4] Masatoshi Okutomi,et al. 24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Eric Brachmann,et al. Expert Sample Consensus Applied to Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Vladlen Koltun,et al. Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[9] Paul Newman,et al. 1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[10] Eric Brachmann,et al. Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC , 2020, ArXiv.

[11] Dieter Schmalstieg,et al. Real-time self-localization from panoramic images on mobile devices , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[12] Dieter Fox,et al. Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[13] W. Kabsch. A solution for the best rotation to relate two sets of vectors , 1976 .

[14] Wolfgang Förstner,et al. Photogrammetric Computer Vision: Statistics, Geometry, Orientation and Reconstruction , 2017 .

[15] Pascal Fua,et al. On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Eric Brachmann,et al. Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Torsten Sattler,et al. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19] Eric Brachmann,et al. Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Luigi di Stefano,et al. On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Gabriela Csurka,et al. Robust Image Retrieval-based Visual Localization using Kapture , 2020, ArXiv.

[22] Torsten Sattler,et al. Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Andrew J. Davison,et al. DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[24] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Juho Kannala,et al. Hierarchical Scene Coordinate Classification and Regression for Visual Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Torsten Sattler,et al. Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes , 2020, ECCV.

[27] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[28] Roland Siegwart,et al. The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[29] Jan-Michael Frahm,et al. A Vote-and-Verify Strategy for Fast Spatial Verification in Image Retrieval , 2016, ACCV.

[30] Daniel P. Huttenlocher,et al. Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[31] Xin Chen,et al. City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[32] Robert M. Haralick,et al. Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.

[33] Torsten Sattler,et al. BAD SLAM: Bundle Adjusted Direct RGB-D SLAM , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Wolfram Burgard,et al. A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35] Torsten Sattler,et al. Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Torsten Sattler,et al. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[37] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38] Torsten Sattler,et al. Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[39] James J. Little,et al. Backtracking regression forests for accurate camera relocalization , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[41] Roland Siegwart,et al. A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[42] David W. Murray,et al. Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[43] Ping Tan,et al. SANet: Scene Agnostic Network for Camera Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Daniel Cremers,et al. Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] Jan Kautz,et al. Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Gonen Eren,et al. Evaluation of video activity localizations integrating quality and quantity measurements , 2014, Comput. Vis. Image Underst..

[47] Liang Wang,et al. A Dataset for Benchmarking Image-Based Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Tomasz Malisiewicz,et al. SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Torsten Sattler,et al. Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a Multi-Camera System , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[50] Andrew W. Fitzgibbon,et al. Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Gabriela Csurka,et al. R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[52] Tomasz Malisiewicz,et al. SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[53] Andrew W. Fitzgibbon,et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[54] Changchang Wu,et al. Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[55] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Jiri Matas,et al. Fixing the Locally Optimized RANSAC , 2012, BMVC.

[57] Andrew W. Fitzgibbon,et al. Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[58] Eric Brachmann,et al. DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] Eric Brachmann,et al. Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61] Richard Szeliski,et al. Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[62] Michael F. Cohen,et al. Real-time image-based 6-DOF localization in large-scale environments , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Torsten Sattler,et al. Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[64] Pascal Fua,et al. Image Matching Across Wide Baselines: From Paper to Practice , 2020, International Journal of Computer Vision.

[65] Daniel Kondermann,et al. Is Crowdsourcing for Optical Flow Ground Truth Generation Feasible? , 2013, ICVS.

[66] Torsten Sattler,et al. Benchmarking 6DOF Urban Visual Localization in Changing Conditions , 2017, ArXiv.

[67] Torsten Sattler,et al. Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Matthias Nießner,et al. Learning to Navigate the Energy Landscape , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[69] Marc Levoy,et al. Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[70] Michael Bosse,et al. Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization , 2015, Robotics: Science and Systems.

[71] Roland Siegwart,et al. From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Pascal Fua,et al. Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[73] Torsten Sattler,et al. A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Matthias Nießner,et al. BundleFusion , 2016, TOGS.

[75] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[76] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[77] Luca Bertinetto,et al. Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation , 2019, 2019 International Conference on 3D Vision (3DV).

[78] Andrew W. Fitzgibbon,et al. Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[79] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80] Torsten Sattler,et al. Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis , 2020, Int. J. Comput. Vis..

[81] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82] Torsten Sattler,et al. Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83] Luigi di Stefano,et al. Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[85] Jianliang Tang,et al. Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..