Wide-Baseline Image Matching with Projective View Synthesis and Calibrated Geometric Verification

Image matching is a fundamental task in photogrammetry and computer vision. While effective solutions exist for narrow-baseline viewing conditions, using detectors, e.g., based on differences of Gaussians (DoG) and descriptors such as scale-invariant feature transform (SIFT), it still remains a challenging problem for wide-baseline configurations. This is particularly true when dealing with UAV-based (unmanned aerial vehicle) images together with images taken from the ground. In this paper, we propose a method for wide-baseline image matching that extends the current state-of-the-art approach matching on demand with view synthesis (MODS) in such a way that even more extreme wide-baseline problems can be solved. We achieve this (1) by making use of projective transformations during view synthesis to overcome limitations induced by the approximate character of affine transformations and (2) by estimating the essential matrix within geometric verification to more robustly filter incorrect correspondences in case of a known camera calibration. We have evaluated our approach on several challenging image pairs mainly consisting of UAV-based images together with images taken from the ground and demonstrate improved performance compared to MODS.ZusammenfassungBildzuordnung bei großer Basis mit projektiver Ansichtssynthese und kalibrierter geometrischer Verifikation. Bildzuordnung ist eine grundlegende Aufgabe in Photogrammetrie und Computer Vision. Während für Aufnahmebedingungen mit kleiner Basis wirksame Lösungen existieren, die Detektoren bspw. basierend auf Differenzen von Gauß-Funktionen (DoG) und Deskriptoren wie Scale-Invariant Feature Transform (SIFT) nutzen, bleibt diese Aufgabe für Konfigurationen mit großer Basis nach wie vor eine Herausforderung. Dies gilt insbesondere, wenn man sich mit UAV-basierten (Unmanned Aerial Vehicle) Bildern zusammen mit Bildern, die vom Boden aus aufgenommen wurden, beschäftigt. In diesem Beitrag schlagen wir eine Methode zur Bildzuordnung bei großer Basis vor, die den aktuellen State-of-the-Art-Ansatz Matching on Demand with View Synthesis (MODS) so erweitert, dass noch extremere Probleme mit großer Basis gelöst werden können. Wir erreichen dies (1) durch Verwendung von projektiven Transformationen während der Ansichtssynthese, um Einschränkungen zu überwinden, die durch den approximativen Charakter von affinen Transformationen verursacht werden, und (2) durch Schätzung der essentiellen Matrix innerhalb der geometrischen Verifikation, um bei bekannter Kamerakalibrierung falsche Korrespondenzen robuster zu filtern. Wir haben unseren Ansatz auf mehreren Bildpaaren mit extrem unterschiedlichen Blickrichtungen evaluiert, welche hauptsächlich aus jeweils einem UAV-basierten Bild und einem Bild, das vom Boden aus aufgenommen wurde, bestehen, und demonstrieren eine verbesserte Leistungsfähigkeit unseres Verfahrens im Vergleich zu MODS.

[1]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[2]  M. Havlena,et al.  Recent developments in large-scale tie-point matching , 2016 .

[3]  Paul S. Heckbert,et al.  Survey of Texture Mapping , 1986, IEEE Computer Graphics and Applications.

[4]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[5]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[6]  Lionel Moisan,et al.  A Probabilistic Criterion to Detect Rigid Point Matches Between Two Images and Estimate the Fundamental Matrix , 2004, International Journal of Computer Vision.

[7]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Heiko Hirschmüller,et al.  Dense 3D Reconstruction from Wide Baseline Image Sets , 2011, Theoretical Foundations of Computer Vision.

[10]  Jiri Matas,et al.  Fixing the Locally Optimized RANSAC , 2012, BMVC.

[11]  Yongtian Wang,et al.  A completely affine invariant image-matching method based on perspective projection , 2011, Machine Vision and Applications.

[12]  Pierre-Marc Jodoin,et al.  Perspective-SIFT: An efficient tool for low-altitude remote sensing image registration , 2013, Signal Process..

[13]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[14]  Jiri Matas,et al.  MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[15]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[16]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[17]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[18]  Karel Lenc,et al.  A Few Things One Should Know About Feature Extraction , Description and Matching , 2014 .

[19]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors based on 3D Objects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[21]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Javier Iparraguirre,et al.  Speeded-up robust features (SURF) as a benchmark for heterogeneous computers , 2014, 2014 IEEE Biennial Congress of Argentina (ARGENCON).

[24]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[25]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[26]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[27]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.