A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval

The state-of-the-art content based image retrieval systems has been significantly advanced by the introduction of SIFT features and the bag-of-words image representation. Converting an image into a bag-of-words, however, involves three non-trivial steps: feature detection, feature description, and feature quantization. At each of these steps, there is a significant amount of information lost, and the resulted visual words are often not discriminative enough for large scale image retrieval applications. In this paper, we propose a novel multi-sample multi-tree approach to computing the visual word codebook. By encoding more information of the original image feature, our approach generates a much more discriminative visual word codebook that is also efficient in terms of both computation and space consumption, without losing the original repeatability of the visual features. We evaluate our approach using both a ground-truth data set and a real-world large scale image database. Our results show that a significant improvement in both precision and recall can be achieved by using the codebook derived from our approach.

[1]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[2]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[3]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[6]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[8]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[10]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[11]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  David G. Lowe,et al.  Shape Descriptors for Maximally Stable Extremal Regions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Simon A. J. Winder,et al.  Object instance recognition using triplets of feature symbols , 2007 .

[18]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[20]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Krystian Mikolajczyk,et al.  Action recognition with motion-appearance vocabulary forest , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .