Sparse representation for coarse and fine object recognition

This paper offers a sparse, multiscale representation of objects. It captures the object appearance by selection from a very large dictionary of Gaussian differential basis functions. The learning procedure results from the matching pursuit algorithm, while the recognition is based on polynomial approximation to the bases, turning image matching into a problem of polynomial evaluation. The method is suited for coarse recognition between objects and, by adding more bases, also for fine recognition of the object pose. The advantages over the common representation using PCA include storing sampled points for recognition is not required, adding new objects to an existing data set is trivial because retraining other object models is not needed, and significantly in the important case where one has to scan an image over multiple locations in search for an object, the new representation is readily available as opposed to PCA projection at each location. The experimental result on the COIL-100 data set demonstrates high recognition accuracy with real-time performance.

[1]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[2]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[3]  Jonathan Phillips,et al.  Matching pursuit filters applied to face identification , 1994, Optics & Photonics.

[4]  William M. Wells,et al.  Efficient Synthesis of Gaussian Filters by Cascaded Uniform Filters , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[6]  Andrew Zisserman,et al.  Viewpoint invariant texture matching and wide baseline stereo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[8]  Narendra Ahuja,et al.  Learning to Recognize Three-Dimensional Objects , 2002, Neural Computation.

[9]  Donald E. Knuth The Art of Computer Programming 2 / Seminumerical Algorithms , 1971 .

[10]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[11]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[12]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[13]  Akram Aldroubi,et al.  B-spline signal processing. II. Efficiency design and applications , 1993, IEEE Trans. Signal Process..

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  S. Mallat A wavelet tour of signal processing , 1998 .

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[20]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[21]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[22]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[23]  Lucas J. van Vliet,et al.  Recursive Gaussian derivative filters , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[25]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[26]  Anuj Srivastava,et al.  Optimal linear representations of images for object recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[27]  R. Tibshirani,et al.  Additive Logistic Regression : a Statistical View ofBoostingJerome , 1998 .

[28]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[29]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[30]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[31]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[32]  Akram Aldroubi,et al.  B-SPLINE SIGNAL PROCESSING: PART II-EFFICIENT DESIGN AND APPLICATIONS , 1993 .

[33]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[34]  Nuno Vasconcelos,et al.  The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition , 2004, ECCV.

[35]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[36]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[37]  Shree K. Nayar,et al.  Automatic generation of RBF networks using wavelets , 1996, Pattern Recognit..

[38]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Michael Unser,et al.  B-spline signal processing. I. Theory , 1993, IEEE Trans. Signal Process..

[41]  Akram Aldroubi,et al.  B-SPLINE SIGNAL PROCESSING: PART I-THEORY , 1993 .

[42]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[43]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[44]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[46]  Joost van de Weijer,et al.  Fast Anisotropic Gauss Filtering , 2002, ECCV.

[47]  Bernhard Schölkopf,et al.  From Regularization Operators to Support Vector Kernels , 1997, NIPS.