Large-Scale Duplicate Detection for Web Image Search

Finding visually identical images in large image collections is important for many applications such as intelligence propriety protection and search result presentation. Several algorithms have been reported in the literature, but they are not suitable for large image collections. In this paper, a novel algorithm is proposed to handle the situation, in which each image is compactly represented by a hash code. To detect duplicate images, only the hash codes are required. In addition, a very efficient search method is implemented to quickly group images with similar hash codes for fast detection. The experiments show that our algorithm can be both efficient and effective for duplicate detection in Web image search

[1]  Eve A. Riskin,et al.  Optimal bit allocation via the generalized BFOS algorithm , 1991, IEEE Trans. Inf. Theory.

[2]  Divyakant Agrawal,et al.  Vector approximation based indexing for non-uniform high dimensional data sets , 2000, CIKM '00.

[3]  Shu Lin,et al.  An Extendible Hash for Multi-Precision Similarity Querying of Image Databases , 2001, VLDB.

[4]  Cormac Herley Why watermarking is nonsense , 2002, IEEE Signal Process. Mag..

[5]  Shih-Fu Chang,et al.  Detection of non-identical duplicate consumer photographs , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[6]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[7]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[8]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .