Hash functions for near duplicate image retrieval

This paper proposes new hash functions for indexing local image descriptors. These functions are first applied and evaluated as a range neighbor algorithm. We show that it obtains similar results as several state of the art algorithms. In the context of near duplicate image retrieval, we integrated the proposed hash functions within a bag of words approach. Because most of the other methods use a kmeans-based vocabulary, they require an off-line learning stage and highest performance is obtained when the vocabulary is learned on the searched database. For application where images are often added or removed from the searched dataset, the learning stage must be repeated regularly in order to keep high recalls. We show that our hash functions in a bag of words approach has similar recalls as bag of words with kmeans vocabulary learned on the searched dataset, but our method does not require any learning stage. It is thus very well adapted to near duplicate image retrieval applications where the dataset evolves regularly as there is no need to update the vocabulary to guarantee the best performance.

[1]  Cordelia Schmid,et al.  Query adaptative locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[3]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[4]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Laurent Amsaleg,et al.  Efficient and Effective Image Copyright Enforcement , 2005, BDA.

[8]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[9]  Matthieu Cord,et al.  High-dimensional descriptor indexing for large multimedia databases , 2008, CIKM '08.

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Nicole Vincent,et al.  How to Use SIFT Vectors to Analyze an Image with Database Templates , 2007, Adaptive Multimedia Retrieval.

[13]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[14]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[15]  Jun Jie Foo,et al.  Pruning SIFT for Scalable Near-duplicate Image Matching , 2007, ADC.