An Efficient Approach to Web Near-Duplicate Image Detection

This paper presents an improved bag-of-words (BoW) framework for detecting near-duplicates of images on the Web and makes three main contributions. Firstly, based on the SIFT feature descriptors, Locality-constrained Linear Coding (LLC) with the spatial pyramid is introduced to encode features. Secondly, a weighted Chi-square distance metric is proposed to compare two histograms, with an inverted indexing scheme for fast similarity evaluation. Thirdly, a 6K dataset consisting of eight categories of objects, which can also be applicable to image retrieval and classification, is built and will be made available to the public in the future. We verify our technique on two benchmarks: our 6K dataset and the publicly available University of Kentucky Benchmark (UKB). The promising experimental results demonstrate the effectiveness and efficiency of our approach for Web Near-Duplicate Image Detection (Web-NDID), which outperforms several state-of-the-art methods.

[1]  Hong Liu,et al.  SVD-SIFT for web near-duplicate image detection , 2010, 2010 IEEE International Conference on Image Processing.

[2]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Justin Zobel,et al.  Discovery of Image Versions in Large Collections , 2007, MMM.

[4]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[5]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[7]  Michael Isard,et al.  General Theory , 1969 .

[8]  Justin Zobel,et al.  Detection of near-duplicate images for web search , 2007, CIVR '07.

[9]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[12]  Edward Y. Chang,et al.  Enhancing DPF for near-replica image recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.