Vocabulary hierarchy optimization for effective and transferable retrieval

Scalable image retrieval systems usually involve hierarchical quantization of local image descriptors, which produces a visual vocabulary for inverted indexing of images. Although hierarchical quantization has the merit of retrieval efficiency, the resulting visual vocabulary representation usually faces two crucial problems: (1) hierarchical quantization errors and biases in the generation of “visual words”; (2) the model cannot adapt to database variance. In this paper, we describe an unsupervised optimization strategy in generating the hierarchy structure of visual vocabulary, which produces a more effective and adaptive retrieval model for large-scale search. We adopt a novel density-based metric learning (DML) algorithm, which corrects word quantization bias without supervision in hierarchy optimization, based on which we present a hierarchical rejection chain for efficient online search based on the vocabulary hierarchy. We also discovered that by hierarchy optimization, efficient and effective transfer of a retrieval model across different databases is feasible. We deployed a large-scale image retrieval system using a vocabulary tree model to validate our advances. Experiments on UKBench and street-side urban scene databases demonstrated the effectiveness of our hierarchy optimization approach in comparison with state-of-the-art methods.

[1]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[2]  Lei Wang Toward A Discriminative Codebook: Codeword Selection across Multi-resolution , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[4]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Murat Dundar,et al.  Joint Optimization of Cascaded Classifiers for Computer Aided Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[8]  Trevor Darrell,et al.  Adaptive Vocabulary Forests br Dynamic Indexing and Category Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[11]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[12]  Bernt Schiele,et al.  Local features for object class recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[18]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[22]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.