Efficient Object Recognition and Image Retrieval for Large-Scale Applications

Algorithms for recognition and retrieval tasks generally call for both speed and accuracy. When scaling up to very large applications, however, we encounter additional significant requirements: adaptability and scalability. In many real-world systems, large numbers of images are constantly added to the database, requiring the algorithm to quickly tune itself to recent trends so it can serve queries more effectively. Moreover, the systems need to be able to meet the demands of simultaneous queries from many users. In this thesis, I describe two new algorithms intended to meet these requirements and give an extensive experimental evaluation for both. The first algorithm constructs an adaptive vocabulary forest, which is an efficient image-database model that grows and shrinks as needed while adapting its structure to tune itself to recent trends. The second algorithm is a method for efficiently performing classification tasks by comparing query images to only a fixed number of training examples, regardless of the size of the image database. These two methods can be combined to create a fast, adaptable, and scalable vision system suitable for large-scale applications. I also introduce LIBPMK, a fast implementation of common computer vision processing pipelines such as that of the pyramid match kernel. This implementation was used to build several successful interactive applications as well as batch experiments for research settings. This implementation, in addition to the two new algorithms introduced by this thesis, are a step toward meeting the speed, adaptability, and scalability requirements of practical large-scale vision systems. Thesis Supervisor: Trevor J. Darrell Title: Associate Professor

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Trevor Darrell,et al.  Towards adaptive object recognition for situated human-computer interaction , 2007, WMISI '07.

[3]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[4]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Trevor Darrell,et al.  Dynamic visual category learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Trevor Darrell,et al.  Unsupervised Distributed Feature Selection for Multi-view Object Recognition , 2008 .

[9]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[12]  Ada Wai-Chee Fu,et al.  Incremental Document Clustering for Web Page Classification , 2002 .

[13]  A. Ennaji,et al.  An Incremental Hierarchical Clustering , 1999 .

[14]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[15]  John J. Lee,et al.  LIBPMK: A Pyramid Match Toolkit , 2008 .

[16]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[17]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[18]  Trevor Darrell,et al.  Multimodal question answering for mobile devices , 2008, IUI '08.

[19]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[24]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[25]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[26]  Rajeev Motwani,et al.  Incremental Clustering and Dynamic Information Retrieval , 2004, SIAM J. Comput..

[27]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[28]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[29]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.