Progressive Large Scale-Invariant Image Matching in Scale Space

The power of modern image matching approaches is still fundamentally limited by the abrupt scale changes in images. In this paper, we propose a scale-invariant image matching approach to tackling the very large scale variation of views. Drawing inspiration from the scale space theory, we start with encoding the image’s scale space into a compact multi-scale representation. Then, rather than trying to find the exact feature matches all in one step, we propose a progressive two-stage approach. First, we determine the related scale levels in scale space, enclosing the inlier feature correspondences, based on an optimal and exhaustive matching in a limited scale space. Second, we produce both the image similarity measurement and feature correspondences simultaneously after restricting matching between the related scale levels in a robust way. The matching performance has been intensively evaluated on vision tasks including image retrieval, feature matching and Structurefrom- Motion (SfM). The successful integration of the challenging fusion of high aerial and low ground-level views with significant scale differences manifests the superiority of the proposed approach.

[1]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[2]  Jiri Matas,et al.  Image Retrieval for Online Browsing in Large Image Collections , 2013, SISAP.

[3]  Martha Larson,et al.  Pairwise geometric matching for large-scale object retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[6]  Jiri Matas,et al.  MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[7]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[8]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  Long Quan,et al.  A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[12]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Long Quan,et al.  Efficient Multi-view Surface Refinement with Adaptive Resolution Control , 2016, ECCV.

[14]  Long Quan Image-Based Modeling , 2009, accv 2009.

[15]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[16]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Luc Van Gool,et al.  Efficient volumetric fusion of airborne and street-side data for urban reconstruction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[22]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[23]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[25]  Jianxiong Xiao,et al.  Local Readjustment for High-Resolution 3D Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Long Quan,et al.  Image-Based Building Regularization Using Structural Linear Features , 2016, IEEE Transactions on Visualization and Computer Graphics.

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jiri Matas,et al.  Efficient Image Detail Mining , 2014, ACCV.

[30]  Long Quan,et al.  Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-view Stereo , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Renaud Marlet,et al.  Virtual Line Descriptor and Semi-Local Graph Matching Method for Reliable Feature Correspondence , 2012, BMVC.

[32]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Long Quan,et al.  Resampling Structure from Motion , 2010, ECCV.

[34]  Jianxiong Xiao,et al.  Image-based street-side city modeling , 2009, ACM Trans. Graph..

[35]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Long Quan,et al.  Parallel Structure from Motion from Local Increment to Global Averaging , 2017 .

[37]  Steven M. Seitz,et al.  Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[38]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Noah Snavely Photo Tourism : Exploring image collections in 3D , 2006 .

[40]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[41]  Jean-Philippe Pons,et al.  Towards high-resolution large-scale multi-view stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Long Quan,et al.  Graph-Based Consistent Matching for Structure-from-Motion , 2016, ECCV.

[43]  Renaud Marlet,et al.  Virtual Line Descriptor and Semi-Local Matching Method for Reliable Feature Correspondence , 2012 .

[44]  Hanqing Lu,et al.  Fast and Accurate Image Matching with Cascade Hashing for 3D Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Zhen Wang,et al.  A Multiscale and Hierarchical Feature Extraction Method for Terrestrial Laser Scanning Point Cloud Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[50]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[51]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[52]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.