ASIFT: A New Framework for Fully Affine Invariant Image Comparison

If a physical object has a smooth or piecewise smooth boundary, its images obtained by cameras in varying positions undergo smooth apparent deformations. These deformations are locally well approximated by affine transforms of the image plane. In consequence the solid object recognition problem has often been led back to the computation of affine invariant image local features. Such invariant features could be obtained by normalization methods, but no fully affine normalization method exists for the time being. Even scale invariance is dealt with rigorously only by the scale-invariant feature transform (SIFT) method. By simulating zooms out and normalizing translation and rotation, SIFT is invariant to four out of the six parameters of an affine transform. The method proposed in this paper, affine-SIFT (ASIFT), simulates all image views obtainable by varying the two camera axis orientation parameters, namely, the latitude and the longitude angles, left over by the SIFT method. Then it covers the other four parameters by using the SIFT method itself. The resulting method will be mathematically proved to be fully affine invariant. Against any prognosis, simulating all views depending on the two camera orientation parameters is feasible with no dramatic computational load. A two-resolution scheme further reduces the ASIFT complexity to about twice that of SIFT. A new notion, the transition tilt, measuring the amount of distortion from one view to another, is introduced. While an absolute tilt from a frontal to a slanted view exceeding 6 is rare, much higher transition tilts are common when two slanted views of an object are compared (see Figure hightransitiontiltsillustration). The attainable transition tilt is measured for each affine image comparison method. The new method permits one to reliably identify features that have undergone transition tilts of large magnitude, up to 36 and higher. This fact is substantiated by many experiments which show that ASIFT significantly outperforms the state-of-the-art methods SIFT, maximally stable extremal region (MSER), Harris-affine, and Hessian-affine.

[1]  Luc Van Gool,et al.  Content-Based Image Retrieval Based on Local Affinely Invariant Regions , 1999, VISUAL.

[2]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[3]  Quanfu Fan,et al.  Matching slides to presentation videos using SIFT and scene background matching , 2006, MIR '06.

[4]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[5]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[6]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[7]  Maneesh Agrawala,et al.  Video-based document tracking: unifying your physical and electronic desktops , 2004, UIST '04.

[8]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[9]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[10]  Manish Kumar,et al.  Building Detection from Mobile Imagery Using Informative SIFT Descriptors , 2005, SCIA.

[11]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[12]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[13]  Edward Y. Chang,et al.  EXTENT: fusing context, content, and semantic ontology for photo annotation , 2005, CVDB '05.

[14]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[15]  Wolfram Burgard,et al.  Metric Localization with Scale-Invariant Visual Features Using a Single Perspective Camera , 2006, EUROS.

[16]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[17]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[18]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors based on 3D Objects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Manuela M. Veloso,et al.  Learning visual object definitions by observing human activities , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[21]  Seung-Hong Hong,et al.  Mobile Robot Localization and Mapping using Scale-Invariant Features , 2005 .

[22]  Keiji Yanai Image collector III: a web image-gathering system with bag-of-keypoints , 2007, WWW '07.

[23]  Wolfgang Heidrich,et al.  Cloth Motion Capture , 2003, SIGGRAPH '03.

[24]  Matthew Toews,et al.  Fundamental Matrix Estimation via TIP - Transfer of Invariant Parameters , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[25]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[26]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[27]  Pietro Perona,et al.  Common-Frame Model for Object Recognition , 2004, NIPS.

[28]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[29]  David Salesin,et al.  Photographing long scenes with multi-viewpoint panoramas , 2006, ACM Trans. Graph..

[30]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[31]  Jean-Michel Morel,et al.  A Theory of Shape Identification , 2008 .

[32]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[33]  Jean-Michel Morel,et al.  A fully affine invariant image comparison method , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[35]  Javier Ruiz-del-Solar,et al.  A New Approach for Fingerprint Verification Based on Wide Baseline Matching Using Local Interest Points and Descriptors , 2007, PSIVT.

[36]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[38]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[39]  Jun Jie Foo,et al.  Pruning SIFT for Scalable Near-duplicate Image Matching , 2007, ADC.

[40]  Benjamin Kuipers,et al.  Building Local Safety Maps for a Wheelchair Robot using Vision and Lasers , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[41]  Neil A. Thacker,et al.  Robust Recognition of Scaled Shapes using Pairwise Geometric Histograms , 1995, BMVC.

[42]  Yann Gousseau,et al.  An A Contrario Decision Method for Shape Element Recognition , 2006, International Journal of Computer Vision.

[43]  Yakup Genc,et al.  GPU-based Video Feature Tracking And Matching , 2006 .

[44]  J. Morel,et al.  INTRODUCTION 1 On the consistency of the SIFT Method , 2008 .

[45]  James J. Little,et al.  Vision-based mobile robot localization and mapping using scale-invariant features , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[46]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[47]  Luc Van Gool,et al.  Affine/ Photometric Invariants for Planar Intensity Patterns , 1996, ECCV.

[48]  Chia-Ling Tsai,et al.  Alignment of challenging image pairs: Refinement and region growing starting from a single keypoint correspondence , 2005 .

[49]  T. Lindeberg,et al.  Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[50]  S. P. Mudur,et al.  Three-dimensional computer vision: a geometric viewpoint , 1993 .

[51]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Jian Yao,et al.  Robust multi-view feature matching from multiple unordered views , 2007, Pattern Recognit..

[54]  Haibin Ling,et al.  Diffusion Distance for Histogram Comparison , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[55]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[56]  Amaury Nègre,et al.  Comparative Study of People Detection in Surveillance Scenes , 2006, SSPR/SPR.

[57]  Haibin Ling,et al.  Deformation invariant image matching , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[58]  Jonathon S. Hare,et al.  Salient Regions for Query by Image Content , 2004, CIVR.

[59]  Yann Gousseau,et al.  Unsupervised thresholds for shape matching , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[60]  Edward Y. Chang,et al.  Fotofiti: web service for photo management , 2006, MM '06.

[61]  Laurent Amsaleg,et al.  Scalability of local image descriptors: a comparative study , 2006, MM '06.

[62]  Jan-Olof Eklundh,et al.  Detecting Symmetry and Symmetric Constellations of Features , 2006, ECCV.

[63]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.