Is SIFT scale invariant

This note is devoted to a mathematical exploration of whether Lowe's Scale-Invariant Feature Transform (SIFT)[21], a very successful image matching method, is similarity invariant as claimed. It is proved that the method is scale invariant only if the initial image blurs are exactly guessed. Yet, even a large error on the initial blur is quickly attenuated by this multiscale method, when the scale of analysis increases. In consequence, its scale invariance is almost perfect. The mathematical arguments are given under the assumption that the Gaussian smoothing performed by SIFT gives an aliasing free sampling of the image evolution. The validity of this main assumption is confirmed by a rigorous experimental procedure, and by a mathematical proof. These results explain why SIFT outperforms all other image feature extraction methods when it comes to scale invariance.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  G. F. Roach,et al.  Inverse problems and imaging , 1991 .

[4]  T. Lindeberg,et al.  Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[5]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[6]  Pascal Monasse,et al.  Contrast invariant registration of images , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[9]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[10]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[11]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[12]  Yann Gousseau,et al.  Unsupervised thresholds for shape matching , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[13]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pietro Perona,et al.  Common-Frame Model for Object Recognition , 2004, NIPS.

[16]  Maneesh Agrawala,et al.  Video-based document tracking: unifying your physical and electronic desktops , 2004, UIST '04.

[17]  T. Tuytelaars,et al.  Matching Widely Separated Views Based on Affine Invariant Regions , 2004, International Journal of Computer Vision.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[20]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[22]  Jonathon S. Hare,et al.  Salient Regions for Query by Image Content , 2004, CIVR.

[23]  Manuela M. Veloso,et al.  Learning visual object definitions by observing human activities , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[24]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[25]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Edward Y. Chang,et al.  EXTENT: fusing context, content, and semantic ontology for photo annotation , 2005, CVDB '05.

[27]  Manish Kumar,et al.  Building Detection from Mobile Imagery Using Informative SIFT Descriptors , 2005, SCIA.

[28]  Amaury Nègre,et al.  Comparative Study of People Detection in Surveillance Scenes , 2006, SSPR/SPR.

[29]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[30]  Laurent Amsaleg,et al.  Scalability of local image descriptors: a comparative study , 2006, MM '06.

[31]  Quanfu Fan,et al.  Matching slides to presentation videos using SIFT and scene background matching , 2006, MIR '06.

[32]  Yann Gousseau,et al.  An A Contrario Decision Method for Shape Element Recognition , 2006, International Journal of Computer Vision.

[33]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[34]  Wolfram Burgard,et al.  Metric Localization with Scale-Invariant Visual Features Using a Single Perspective Camera , 2006, EUROS.

[35]  Edward Y. Chang,et al.  Fotofiti: web service for photo management , 2006, MM '06.

[36]  Benjamin Kuipers,et al.  Building Local Safety Maps for a Wheelchair Robot using Vision and Lasers , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[37]  Matthew Toews,et al.  Fundamental Matrix Estimation via TIP - Transfer of Invariant Parameters , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[38]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Michael F. Cohen,et al.  Photographing long scenes with multi-viewpoint panoramas , 2006, ACM Trans. Graph..

[40]  Jun Jie Foo,et al.  Pruning SIFT for Scalable Near-duplicate Image Matching , 2007, ADC.

[41]  Javier Ruiz-del-Solar,et al.  A New Approach for Fingerprint Verification Based on Wide Baseline Matching Using Local Interest Points and Descriptors , 2007, PSIVT.

[42]  Keiji Yanai Image collector III: a web image-gathering system with bag-of-keypoints , 2007, WWW '07.

[43]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[44]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[45]  Julien Rabin,et al.  A Statistical Approach to the Matching of Local Features , 2009, SIAM J. Imaging Sci..