Spry features

The fundamental contribution of this thesis is Spry — a new framework for spry (agile, nimble) object/scene recognition. Spry presents a new affine invariant recognition paradigm that typically produces results at least as good as, if not better than, current state of the art algorithms while being significantly faster. To achieve this, Spry develops new technologies that stand on their own as contributions to the fields of Image Processing and Computer Vision. First, we present a novel, fully automatic, method to create box filters that achieve an excellent approximation of any arbitrary 2-D filter. We approximate the original filter by a weighted sum of individual box filters and formulate the filter approximation as an optimization problem. We present. two algorithms that can determine the optimal location of the box filters: Exhaustive Search and Directed Search. We show that both algorithms find good approximations to general fillers. Second, we develop methods that are invariant to the general affine movement by decomposing it into four distortions with geometric meaning: rotation in the object plane; rotation in the image plane; isotropic scaling; and tilt (anisotropic scaling). We develop an affine-space function that achieves invariance to all these distortions and we show that it can be computed by convolution of a filter bank with the input image. Third, we introduce iterative feature detection and description. In contrast with all relevant state of the art methods that detect features sequentially in a single pass, our method is iterative: detecting, describing, and matching features in batches. Fourth, we show that it is possible to detect and describe features iteratively without fully filtering the input image. We develop a novel greedy algorithm for iterative feature detection and description that works by randomly deciding a starting location on the scale space and then exploring the neighborhood of that location until a suitable feature point is found. We introduce a new descriptor, a modified version of the SIFT descriptor, that uses the observation information in order to be fully affine invariant. Fifth, we show that traditional methods of feature matching are not appropriate for our iterative framework and instead introduce a new feature matching algorithm based on the use of an Iterative k-Dimensional Tree. We show that this new data structure is ideal for applications in which the number of features increases at runtime and demonstrate, both theoretically and experimentally, that its performance is superior to traditional methods for matching features in growing databases. Sixth, we present three alternative approaches to the decision task. In the first approach, Nomography Estimation and Verification, we find groups of features that can be related by homographies and only report matches that agree with one of the homographies in the image. In the second approach, LASIC, we formulate the decision problem as an hypothesis test and derive the uniformly most. powerful (UMP) test that is affine invariant. We formulate the matching problem as a quadratic maximization in the space of permutation matrices and present an efficient algorithm to solve this optimization problem. In the third approach, Shapes as Empirical Distributions, we interpret the shape of an object as a probability distribution governing the location of the features of the object and interpret an image of an object as a random drawing from the shape distribution. We use Maximum Likelihood and formulate the decision problem associated with shape classification as a hypothesis test for which we characterize the performance. Finally, we demonstrate how all the above contributions conic together to create Spry. Multiple experimental results corroborate the superiority of Spry versus all current competitive state of the art algorithms in the field.

[1]  Quanfu Fan,et al.  Matching slides to presentation videos using SIFT and scene background matching , 2006, MIR '06.

[2]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[3]  Alvaro Collet,et al.  Making specific features less discriminative to improve point-based 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[5]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[6]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Rachid Deriche,et al.  A Robust Technique for Matching two Uncalibrated Images Through the Recovery of the Unknown Epipolar Geometry , 1995, Artif. Intell..

[9]  João M. F. Xavier,et al.  Classification of unlabeled point sets using ANSIG , 2008, 2008 15th IEEE International Conference on Image Processing.

[10]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[12]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[13]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[14]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[15]  Dean G. Blevins,et al.  Introduction 11-1 , 1969 .

[16]  José M. F. Moura,et al.  Affine-permutation invariance of 2-D shapes , 2005, IEEE Transactions on Image Processing.

[17]  Wolfram Burgard,et al.  Metric Localization with Scale-Invariant Visual Features Using a Single Perspective Camera , 2006, EUROS.

[18]  Keiji Yanai Image collector III: a web image-gathering system with bag-of-keypoints , 2007, WWW '07.

[19]  José M. F. Moura,et al.  Approximating image filters with box filters , 2011, 2011 18th IEEE International Conference on Image Processing.

[20]  João Paulo Costeira,et al.  A Global Solution to Sparse Correspondence Problems , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[22]  José M. F. Moura,et al.  Shapes as empirical distributions , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[23]  Dean G. Blevins Introduction 6-1 , 1969 .

[24]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[25]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[26]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[27]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[28]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[29]  Pascal Fua,et al.  On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[31]  Chia-Ling Tsai,et al.  Alignment of challenging image pairs: Refinement and region growing starting from a single keypoint correspondence , 2005 .

[32]  T. Lindeberg Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[33]  Yehezkel Lamdan,et al.  Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[34]  P. Kovesi Arbitrary Gaussian Filtering with 25 Additions and 5 Multiplications per Pixel , 2009 .

[35]  Georgios D. Evangelidis,et al.  An enhanced correlation-based method for stereo correspondence with subpixel accuracy , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[37]  David A. Forsyth,et al.  Efficient model library access by projectively invariant indexing functions , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Franklin C. Crow,et al.  Summed-area tables for texture mapping , 1984, SIGGRAPH.

[39]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Amaury Nègre,et al.  Comparative Study of People Detection in Surveillance Scenes , 2006, SSPR/SPR.

[41]  Jonathon S. Hare,et al.  Salient Regions for Query by Image Content , 2004, CIVR.

[42]  Eamonn J. Keogh,et al.  Curse of Dimensionality , 2010, Encyclopedia of Machine Learning.

[43]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[45]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[46]  Jean-Michel Morel,et al.  ASIFT: An Algorithm for Fully Affine Invariant Comparison , 2011, Image Process. Line.

[47]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[48]  Lars Bretzner,et al.  Real-Time Scale Selection in Hybrid Multi-scale Representations , 2003, Scale-Space.

[49]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  T. K. Carne,et al.  Shape and Shape Theory , 1999 .

[52]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[53]  José M. F. Moura,et al.  LASIC: A model invariant framework for correspondence , 2008, 2008 15th IEEE International Conference on Image Processing.