Fast near duplicate detection for personal image collections

Due to the rapid growth in personal image collections, there is increasing interest on automatic detection of near duplicates. In this paper, we propose a novel fast near duplicate detection framework that takes advantages of heterogeneous features like EXIF data, global image histogram and local features. To improve the accuracy of local feature matching, we have developed a structure matching algorithm that takes into account of a local feature's neighborhood which can effectively reject mismatches. In addition, we developed a computation-sensitive cascade framework to combine stage classifiers trained on different feature spaces with different computational cost. This method can quickly accept easily identified duplicates using only cheap features without the need to extract more sophisticate but expensive ones. Compared with existing approaches, our experiments show very promising results using our new approach in terms of both efficiency and effectiveness.

[1]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[2]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[3]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[4]  Edward Y. Chang,et al.  Enhanced perceptual distance functions and indexing for image replica recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[6]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[7]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[8]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by nonrigid image matching , 2008, ACM Multimedia.

[9]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[10]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[11]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  Chong-Wah Ngo,et al.  Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation , 2006, MM '06.

[13]  Hai Tao,et al.  A novel feature descriptor invariant to complex brightness changes , 2009, CVPR.