INTRODUCTION 1 On the consistency of the SIFT Method

This note is devoted to the mathematical arguments proving that Lowe’s Scale-Invariant Feature Transform (SIFT [23]), a very successful image matching method, is indeed similarity invariant. The mathematical proof is given under the assumption that the gaussian smoothing performed by SIFT gives aliasing free sampling. The validity of this main assumption is confirmed by a rigorous experimental procedure. These results explain why SIFT outperforms all other image feature extraction methods when it comes to scale invariance.

[1]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[2]  T. Lindeberg,et al.  Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[3]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[4]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[6]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[7]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[8]  Yann Gousseau,et al.  Unsupervised thresholds for shape matching , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[9]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Pietro Perona,et al.  Common-Frame Model for Object Recognition , 2004, NIPS.

[12]  Maneesh Agrawala,et al.  Video-based document tracking: unifying your physical and electronic desktops , 2004, UIST '04.

[13]  T. Tuytelaars,et al.  Matching Widely Separated Views Based on Affine Invariant Regions , 2004, International Journal of Computer Vision.

[14]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[15]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[17]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[18]  Jonathon S. Hare,et al.  Salient Regions for Query by Image Content , 2004, CIVR.

[19]  Chia-Ling Tsai,et al.  Alignment of challenging image pairs: Refinement and region growing starting from a single keypoint correspondence , 2005 .

[20]  Manuela M. Veloso,et al.  Learning visual object definitions by observing human activities , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[21]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[22]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Edward Y. Chang,et al.  EXTENT: fusing context, content, and semantic ontology for photo annotation , 2005, CVDB '05.

[24]  Manish Kumar,et al.  Building Detection from Mobile Imagery Using Informative SIFT Descriptors , 2005, SCIA.

[25]  Amaury Nègre,et al.  Comparative Study of People Detection in Surveillance Scenes , 2006, SSPR/SPR.

[26]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[27]  Laurent Amsaleg,et al.  Scalability of local image descriptors: a comparative study , 2006, MM '06.

[28]  Quanfu Fan,et al.  Matching slides to presentation videos using SIFT and scene background matching , 2006, MIR '06.

[29]  Yann Gousseau,et al.  An A Contrario Decision Method for Shape Element Recognition , 2006, International Journal of Computer Vision.

[30]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[31]  Wolfram Burgard,et al.  Metric Localization with Scale-Invariant Visual Features Using a Single Perspective Camera , 2006, EUROS.

[32]  Edward Y. Chang,et al.  Fotofiti: web service for photo management , 2006, MM '06.

[33]  Benjamin Kuipers,et al.  Building Local Safety Maps for a Wheelchair Robot using Vision and Lasers , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[34]  Matthew Toews,et al.  Fundamental Matrix Estimation via TIP - Transfer of Invariant Parameters , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[35]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Michael F. Cohen,et al.  Photographing long scenes with multi-viewpoint panoramas , 2006, ACM Trans. Graph..

[37]  Jun Jie Foo,et al.  Pruning SIFT for Scalable Near-duplicate Image Matching , 2007, ADC.

[38]  Julien Rabin,et al.  A contrario matching of local descriptors , 2007 .

[39]  Javier Ruiz-del-Solar,et al.  A New Approach for Fingerprint Verification Based on Wide Baseline Matching Using Local Interest Points and Descriptors , 2007, PSIVT.

[40]  Keiji Yanai Image collector III: a web image-gathering system with bag-of-keypoints , 2007, WWW '07.

[41]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[42]  Jean-Michel Morel,et al.  A Theory of Shape Identification , 2008 .