Learning distance functions for exemplar-based object recognition

This thesis investigates an exemplar-based approach to object recognition that learns, on an image-by-image basis, the relative importance of patch-based features for determining similarity. We borrow the idea of "family resemblances" from Wittgenstein's Philosophical Investigations and Eleanor Rosch's psychological studies to support the idea of learning the detailed relationships between images of the same category, which is a departure from some popular machine learning approaches such as Support Vector Machines that seek only the boundaries between categories. We represent images as sets of patch-based features. To find the distance between two images, we first find for each patch its nearest patch in the other image and compute their inter-patch distance. The weighted sum of these inter-patch distances is defined to be the distance between the two images. The main contribution of this thesis is a method for learning a set-to-set distance function specific to each training image and demonstrating the use of these functions for image browsing, retrieval, and classification. The goal of the learning algorithm is to assign a non-negative weight to each patch-based feature of the image such that the most useful patches are assigned large weights and irrelevant or confounding patches are given zero weights. We formulate this as a large-margin optimization, related to the soft-margin Support Vector Machine, and discuss two versions: a "focal" version that learns weights for each image separately, and a "global" version that jointly learns the weights for all training images. In the focal version, the distance functions learned for the training images are not directly comparable to one another and can be most directly applied to in-sample applications such as image browsing, though with heuristics or additional learning, these functions can be used for image retrieval or classification. The global approach, however, learns distance functions that are globally consistent and can be used directly for image retrieval and classification. Using geometric blur and simple color features, we show that both versions perform as well or better than the best-performing algorithms on the Caltech 101 object recognition benchmark. The global version achieves the best results, a 63.2% mean recognition rate when trained with fifteen images per category and 66.6% when trained with twenty.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[3]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[4]  C. Habel,et al.  Language , 1931, NeuroImage.

[5]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[6]  Pietro Perona,et al.  Combining generative models and Fisher kernels for object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[9]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  János D. Pintér Yair Censor and Stavros A. Zenios, Parallel Optimization — Theory, Algorithms, and Applications. Oxford University Press, New York/Oxford, 1997, xxviii+539 pages. ISBN 0-19-510062-X (US $ 85.00) , 2000, J. Glob. Optim..

[12]  Shimon Ullman,et al.  Recognizing solid objects by alignment with an image , 1990, International Journal of Computer Vision.

[13]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[14]  Jitendra Malik,et al.  Learning a discriminative classifier using shape context distances , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[17]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[18]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[19]  Gang Wang,et al.  Using Dependent Regions for Object Categorization in a Generative Framework , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[22]  Cordelia Schmid,et al.  Combining greyvalue invariants with local constraints for object recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[24]  Trevor Darrell,et al.  Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (version 2) , 2006 .

[25]  R. Mohr,et al.  Image retrieval using local characterization , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[26]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[27]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[28]  Martial Hebert,et al.  Fully automatic registration of multiple 3D data sets , 2003, Image Vis. Comput..

[29]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[30]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[31]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[32]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[33]  Kathleen Daly Volume 7 , 1998 .

[34]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[37]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[38]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[40]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[41]  Linda G. Shapiro,et al.  A new paradigm for recognizing 3-D objects from range data , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[43]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[44]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[45]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[46]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[47]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[48]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[49]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[50]  E. Rosch,et al.  Categorization of Natural Objects , 1981 .

[51]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[52]  W. Brogden Annual Review of Psychology , 1957 .

[53]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[54]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[55]  Jitendra Malik,et al.  Recognizing objects in adversarial clutter: breaking a visual CAPTCHA , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[56]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[57]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[58]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[59]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[60]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[61]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[62]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[63]  P. Kay,et al.  The linguistic significance of the meanings of basic color terms , 1978 .