Learning Vocabulary-Based Hashing with AdaBoost

Approximate near neighbor search plays a critical role in various kinds of multimedia applications. The vocabulary-based hashing scheme uses vocabularies, i.e. selected sets of feature points, to define a hash function family. The function family can be employed to build an approximate near neighbor search index. The critical problem in vocabulary-based hashing is the criteria of choosing vocabularies. This paper proposes a approach to greedily choosing vocabularies via Adaboost. An index quality criterion is designed for the AdaBoost approach to adjust the weight of the training data. We also describe the parallelized version of the index for large scale applications. The promising results of the near-duplicate image detection experiments show the efficiency of the new vocabulary construction algorithm and desired qualities of the parallelized vocabulary-based hashing for large scale applications.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Bo Zhang,et al.  Vocabulary-based hashing for image search , 2009, ACM Multimedia.

[4]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[5]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[6]  Edward Y. Chang,et al.  Enhancing DPF for near-replica image recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[8]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[9]  Olivier Buisson,et al.  Z-grid-based probabilistic retrieval for scaling up content-based copy detection , 2007, CIVR '07.

[10]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[11]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[12]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[13]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[16]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.