Evaluation of Near-Duplicate Image Retrieval Algorithms for the Identification of Celebrities in Web Images

Near-duplicate image retrieval is a classical research problem in computer vision, for which a large number of diverse approaches have been proposed. Recent studies have revealed that it can be used as an intermediate step to implement search-based celebrity identification given the existence of huge volume of user-tagged or text-surrounded celebrity images on the web. However, the effectiveness of existing near-duplicate image retrieval methods for such a task still remains unclear. To address this issue, this paper presents a comprehensive study of the existing near-duplicate image retrieval methods in a structural way. Four representatives of the existing methods, i.e. hash signature, mean SSIM, BoVW with SIFT features and ARG, are experimentally evaluated using a self-constructed dataset containing 24762 images of 15 top searched celebrities collected using 6 news search engines and the Google image search engine. The experimental results reveal that, compared with global feature based methods, local feature based ones are usually more appropriate for the task of celebrity identification in web images, as they can deal with partial duplicate and scene similar images better. In particular, BoVW with SIFT features is recommended as it provides the best trade-off between on-line speed and retrieval accuracy.

[1]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[2]  Bin Wang,et al.  Large-Scale Duplicate Detection for Web Image Search , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[3]  Shai Avidan,et al.  Internet Vision , 2010, Proc. IEEE.

[4]  Shih-Fu Chang,et al.  Statistical part-based models: theory and applications in image similarity, object detection and region labeling , 2006 .

[5]  Wei-Ying Ma,et al.  Duplicate-Search-Based Image Annotation Using Web-Scale Data , 2012, Proceedings of the IEEE.

[6]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Xiao Zhang,et al.  Finding Celebrities in Billions of Web Images , 2012, IEEE Transactions on Multimedia.

[8]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.