Near Duplicate Image Detecting Algorithm based on Bag of Visual Word Model

In recent years, near duplicate image detecting becomes one of the most important problems in image retrieval, and it is widely used in many application fields, such as copyright violations and detecting forged images. Therefore, in this paper, we propose a novel approach to automatically detect near duplicate images based on visual word model. SIFT descriptors are utilized to represent image visual content which is an effective method in computer vision research field to detect local features of images. Afterwards, we cluster the SIFT features of a given image into several clusters by the K-means algorithm. The centroid of each cluster is regarded as a visual word, and all the centroids are used to construct the visual word vocabulary. To reduce the time cost of near duplicate image detecting process, locality sensitive hashing is utilized to map high-dimensional visual features into low-dimensional hash bucket space, and then the image visual features are converted to a histogram. Next, for a pair of images, we present a local feature based image similarity estimating method by computing histogram distance, and then near duplicate images can be detected. Finally, a series of experiments are constructed to make performance evaluation, and related analyses about experimental results are also given

[1]  Li Zhuo,et al.  An approach of bag-of-words based on visual attention model for pornographic images recognition in compressed domain , 2013, Neurocomputing.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Xing Xie,et al.  Coherent Phrase Model for Efficient Image Near-Duplicate Retrieval , 2009, IEEE Transactions on Multimedia.

[4]  Junfeng He,et al.  Optimal Parameters for Locality-Sensitive Hashing , 2012, Proceedings of the IEEE.

[5]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[6]  Xiongfei Li,et al.  Adaptive Feature Selection and Extraction Approaches for Image Retrieval based on Region , 2010, J. Multim..

[7]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[8]  Yanqiang Lei,et al.  Near-Duplicate Image Detection in a Visually Salient Riemannian Space , 2012, IEEE Transactions on Information Forensics and Security.

[9]  Borut Zalik,et al.  Fast Convex Layers Algorithm for Near-Duplicate Image Detection , 2012, Informatica.

[10]  Ioannis Pitas,et al.  Color-based descriptors for image fingerprinting , 2006, IEEE Transactions on Multimedia.

[11]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Wei-Ying Ma,et al.  Duplicate-Search-Based Image Annotation Using Web-Scale Data , 2012, Proceedings of the IEEE.

[13]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[14]  Yanqiang Lei,et al.  Robust image hash in Radon transform domain for authentication , 2011, Signal Process. Image Commun..

[15]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[17]  Sheng Tang,et al.  Efficient Feature Detection and Effective Post-Verification for Large Scale Near-Duplicate Image Search , 2011, IEEE Transactions on Multimedia.

[18]  Miguel Velez-Reyes,et al.  A Vector SIFT Detector for Interest Point Detection in Hyperspectral Imagery , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Ioannis Pratikakis,et al.  Bag of spatio-visual words for context inference in scene classification , 2013, Pattern Recognit..

[20]  Won-Keun Yang,et al.  Concentric Circle-Based Image Signature for Near-Duplicate Detection in Large Databases , 2010 .

[21]  Hong Liu,et al.  A comprehensive study on learning to rank for content-based image retrieval , 2013, Signal Process..

[22]  Yunqi Lei,et al.  Feature Description and Image Retrieval Based on Visual Attention Model , 2011, J. Multim..

[23]  Jie Yang,et al.  An efficient indexing method for content-based image retrieval , 2013, Neurocomputing.

[24]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[25]  Qi Tian,et al.  SIFT match verification by geometric coding for large-scale partial-duplicate web image search , 2013, TOMCCAP.

[26]  Vincenzo Di Lecce,et al.  A Comparative Evaluation of Retrieval Methods for Duplicate Search in Image Database , 2001, J. Vis. Lang. Comput..

[27]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.