论文信息 - Salient covariance for near-duplicate image and video detection

Salient covariance for near-duplicate image and video detection

This paper introduces the covariance matrix of visually salient image features as a compact and robust descriptor for near duplicate image and video copy detection. We make two novel contributions. We first present a fast method for computing information theoretic based visual saliency maps using a data independent fast transform to replace the conventional data dependent computationally demanding transforms. We then introduce salient covariance (SCOV) — the covariance matrix of various image features within the visually salient regions and use SCOV for near duplicate image and video copy detection. We present experimental results to show that our new fast visual saliency computation technique improves efficiency without compromising performances. We demonstrate that SCOV is a very compact and robust feature for near duplicate image and video copy detection. Compared to popular features such as GIST, SCOV is not only more robust against various manipulations but also can be over 20 times more compact whilst achieving the same or better performances.

[1] W. Förstner,et al. A Metric for Covariance Matrices , 2003 .

[2] John K. Tsotsos,et al. Saliency Based on Information Maximization , 2005, NIPS.

[3] Fatih Murat Porikli,et al. Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[4] Li Chen,et al. Video copy detection: a comparative study , 2007, CIVR '07.

[5] Xiaodong Gu,et al. An Information Theoretic Model of Spatiotemporal Visual Saliency , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[6] Olivier Buisson,et al. Scalable mining of large video databases using copy detection , 2008, ACM Multimedia.

[7] Janet Hui-wen Hsiao,et al. NIMBLE: a kernel density model of saccade-based visual memory. , 2008, Journal of vision.

[8] Cordelia Schmid,et al. Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[9] Mei-Chen Yeh,et al. A compact, effective descriptor for video copy detection , 2009, MM '09.

[10] Frédo Durand,et al. Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.