Image Matching Across Wide Baselines: From Paper to Practice

We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task -- the accuracy of the reconstructed camera pose -- as our primary metric. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of Structure from Motion (SfM) pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online this https URL, providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge this https URL.

[1]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[2]  Fabio Bellavia,et al.  Is There Anything New to Say About SIFT Matching? , 2020, International Journal of Computer Vision.

[3]  Pascal Fua,et al.  Beyond Cartesian Representations for Local Descriptors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Henrik Aanæs,et al.  Interesting Interest Points , 2011, International Journal of Computer Vision.

[5]  Eric Brachmann,et al.  Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[8]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Torsten Sattler,et al.  Benchmarking 6DOF Urban Visual Localization in Changing Conditions , 2017, ArXiv.

[10]  AanæsHenrik,et al.  Interesting Interest Points , 2012 .

[11]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Lei Zhou,et al.  GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints , 2018, ECCV.

[13]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[14]  Ganesh Iyer,et al.  ∇SLAM: Dense SLAM meets Automatic Differentiation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Jiri Matas,et al.  MAGSAC: Marginalizing Sample Consensus , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Robert Pless,et al.  Consistent Temporal Variations in Many Outdoor Scenes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jiri Matas,et al.  Repeatability Is Not Enough: Learning Affine Regions via Discriminability , 2017, ECCV.

[20]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Krystian Mikolajczyk,et al.  Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[23]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[24]  Jing Zhu,et al.  Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment , 2019, ArXiv.

[25]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[26]  C. Lawrence Zitnick,et al.  Edge foci interest points , 2011, 2011 International Conference on Computer Vision.

[27]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[29]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Yannis Avrithis,et al.  Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images , 2016, International Journal of Computer Vision.

[31]  Ping Tan,et al.  BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[32]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[33]  Stefano Soatto,et al.  Domain-size pooling in local descriptors: DSP-SIFT , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[35]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[36]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[39]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Jiri Matas,et al.  MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[41]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[43]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[44]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Shih-Fu Chang,et al.  Learning Discriminative and Transformation Covariant Local Feature Detectors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ji Zhao,et al.  An Evaluation of Feature Matchers for Fundamental Matrix Estimation , 2019, BMVC.

[47]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Shadi Albarqouni,et al.  Adversarial Networks for Camera Pose Regression and Refinement , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[49]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[50]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[51]  Jonathan Balzer,et al.  Multi-view feature engineering and learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[54]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jan-Michael Frahm,et al.  Reconstructing the world* in six days , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jiri Matas,et al.  Leveraging Outdoor Webcams for Local Descriptor Learning , 2019, ArXiv.

[57]  Jiri Matas,et al.  Two-view geometry estimation unaffected by a dominant plane , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[59]  Richard I. Hartley,et al.  Projective Reconstruction and Invariants from Multiple Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[61]  Cordelia Schmid,et al.  SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[62]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[63]  Vincent Lepetit,et al.  Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[65]  Paul Beaudet,et al.  Rotationally invariant image operators , 1978 .

[66]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[67]  Yinda Zhang,et al.  DeepSFM: Structure From Motion Via Deep Bundle Adjustment , 2019, ECCV.

[68]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[69]  Torsten Sattler,et al.  Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[71]  Lei Zhou,et al.  Very Large-Scale Global SfM by Distributed Motion Averaging , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[72]  Zhanyi Hu,et al.  HSfM: Hybrid Structure-from-Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[74]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[75]  Zhiguo Cao,et al.  NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[77]  Lei Zhou,et al.  ContextDesc: Local Descriptor Augmentation With Cross-Modality Context , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[79]  Henrik Aanæs,et al.  Estimation of Deformable Structure and Motion , 2002 .

[80]  Michael Bosse,et al.  Large-scale, real-time visual–inertial localization revisited , 2019, Int. J. Robotics Res..

[81]  Vladlen Koltun,et al.  Deep Fundamental Matrix Estimation , 2018, ECCV.

[82]  Jiří Matas,et al.  Saddle: Fast and repeatable features with good coverage , 2020, Image Vis. Comput..

[83]  Pascal Fua,et al.  On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[85]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Long Quan,et al.  Learning Two-View Correspondences and Geometry Using Order-Aware Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[87]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[89]  Weiwei Sun,et al.  Attentive Context Normalization for Robust Permutation-Equivariant Learning , 2019, ArXiv.

[90]  Tomasz Malisiewicz,et al.  Toward Geometric Deep SLAM , 2017, ArXiv.

[91]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[92]  John D. Austin,et al.  Adaptive histogram equalization and its variations , 1987 .

[93]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[94]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[96]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  David A. Shamma,et al.  YFCC100M , 2016 .

[98]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[99]  Alessio Del Bue,et al.  Probabilistic Structure from Motion with Objects (PSfMO) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[100]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[101]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Torsten Sattler,et al.  To Learn or Not to Learn: Visual Localization from Essential Matrices , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[103]  Jiri Matas,et al.  Graph-Cut RANSAC , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[104]  Tomasz Malisiewicz,et al.  Self-Improving Visual Odometry , 2018, ArXiv.

[105]  Ondrej Chum,et al.  Explicit Spatial Encoding for Deep Local Descriptors , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[107]  Gabriela Csurka,et al.  R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[108]  Nanning Zheng,et al.  Kernelized Subspace Pooling for Deep Local Descriptors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[109]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[110]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[111]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[112]  Fei Wang,et al.  Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses , 2018, ECCV.

[113]  Xin Yu,et al.  SOSNet: Second Order Similarity Regularization for Local Descriptor Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[114]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[115]  Tomasz Malisiewicz,et al.  SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[116]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[117]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.