Covering the Space of Tilts. Application to Affine Invariant Image Comparison

We propose a mathematical method to analyze the numerous algorithms performing Image Matching by Affine Simulation (IMAS). To become affine invariant they apply a discrete set of affine transforms to the images, previous to the comparison of all images by a Scale Invariant Image Matching (SIIM), like SIFT. Obviously this multiplication of images to be compared increases the image matching complexity. Three questions arise: a) what is the best set of affine transforms to apply to each image to gain full practical affine invariance? b) what is the lowest attainable complexity for the resulting method? c) how to choose the underlying SIIM method? We provide an explicit answer and a mathematical proof of quasi-optimality of the solution to the first question. As an answer to b) we find that the near-optimal complexity ratio between full affine matching and scale invariant matching is more than halved, compared to the current IMAS methods. This means that the number of key points necessary for affine matching can be halved, and that the matching complexity is divided by four for exactly the same performance. This also means that an affine invariant set of descriptors can be associated with any image. The price to pay for full affine invariance is that the cardinality of this set is 6.34 times larger than for a SIIM.

[1]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[3]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[4]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[5]  David Salesin,et al.  Photographing long scenes with multi-viewpoint panoramas , 2006, SIGGRAPH 2006.

[6]  Davide Cozzolino,et al.  Efficient Dense-Field Copy–Move Forgery Detection , 2015, IEEE Transactions on Information Forensics and Security.

[7]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[8]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[9]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Sabine Süsstrunk,et al.  Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[12]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[13]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[14]  Lionel Moisan,et al.  Automatic Homographic Registration of a Pair of Images, with A Contrario Elimination of Outliers , 2012, Image Process. Line.

[15]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[16]  Jan-Olof Eklundh,et al.  Detecting Symmetry and Symmetric Constellations of Features , 2006, ECCV.

[17]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[18]  Wei Li,et al.  Fully affine invariant SURF for image matching , 2012, Neurocomputing.

[19]  Chia-Ling Tsai,et al.  Alignment of challenging image pairs: Refinement and region growing starting from a single keypoint correspondence , 2005 .

[20]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[21]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[22]  Jean-Michel Morel,et al.  A Theory of Shape Identification , 2008 .

[23]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[24]  Tony Lindeberg,et al.  Generalized Gaussian Scale-Space Axiomatics Comprising Linear Scale-Space, Affine Scale-Space and Spatio-Temporal Scale-Space , 2011, Journal of Mathematical Imaging and Vision.

[25]  O. Faugeras Three-dimensional computer vision: a geometric viewpoint , 1993 .

[26]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Shai Avidan,et al.  FasT-Match: Fast Affine Template Matching , 2013, CVPR.

[28]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Achim J. Lilienthal,et al.  SIFT, SURF & seasons: Appearance-based long-term localization in outdoor environments , 2010, Robotics Auton. Syst..

[30]  Yann Gousseau,et al.  An A Contrario Decision Method for Shape Element Recognition , 2006, International Journal of Computer Vision.

[31]  Jean-Michel Morel,et al.  ASIFT: An Algorithm for Fully Affine Invariant Comparison , 2011, Image Process. Line.

[32]  James J. Little,et al.  Vision-based mobile robot localization and mapping using scale-invariant features , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[33]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[35]  Wolfgang Heidrich,et al.  Cloth Motion Capture , 2003, Comput. Graph. Forum.

[36]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[37]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[38]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[40]  Amaury Nègre,et al.  Comparative Study of People Detection in Surveillance Scenes , 2006, SSPR/SPR.

[41]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[42]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[43]  Luc Van Gool,et al.  Content-Based Image Retrieval Based on Local Affinely Invariant Regions , 1999, VISUAL.

[44]  Tal Hassner,et al.  LATCH: Learned arrangements of three patch codes , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[46]  Jonathon S. Hare,et al.  Salient Regions for Query by Image Content , 2004, CIVR.

[47]  Yann Gousseau,et al.  Unsupervised thresholds for shape matching , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[48]  Jiri Matas,et al.  MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[49]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[50]  Huiyu Zhou,et al.  Object tracking using SIFT features and mean shift , 2009, Comput. Vis. Image Underst..

[51]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[52]  Manish Kumar,et al.  Building Detection from Mobile Imagery Using Informative SIFT Descriptors , 2005, SCIA.

[53]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Tony Lindeberg Invariance of visual operations at the level of receptive fields , 2013, BMC Neuroscience.

[55]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors Based on 3D Objects , 2005, ICCV.

[56]  Benjamin Kuipers,et al.  Building Local Safety Maps for a Wheelchair Robot using Vision and Lasers , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[57]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[58]  Tony Lindeberg,et al.  Direct estimation of affine image deformations using visual front-end operations with automatic scale selection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[59]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[60]  Marcel Worring,et al.  The MediaMill TRECVID 2009 Semantic Video Search Engine , 2009, TRECVID.

[61]  Maxim Karpushin,et al.  Local features for RGBD image matching under viewpoint changes. (Caractéristiques locales pour la mise en correspondance d'images RGBD sous changements de position de la camera) , 2016 .