Using Web Photos for Measuring Video Frame Interestingness

In this paper, we present a method that uses web photos for measuring frame interestingness of a travel video. Web photo collections, such as those on Flickr, tend to contain interesting images because their images are more carefully taken, composed, and selected. Because these photos have already been chosen as subjectively interesting, they serve as evidence that similar images are also interesting. Our idea is to leverage these web photos to measure the interestingness of video frames. Specifically, we measure the interestingness of each video frame according to its similarity to web photos. The similarity is defined based on the scene content and composition. We characterize the scene content using scale invariant local features, specifically SIFT keypoints. We characterize composition by feature distribution. Accordingly, we measure the similarity between a web photo and a video frame based on the co-occurrence of the SIFT features, and the similarity between their spatial distribution. Interestingness of a video frame is measured by considering how many photos it is similar to, and how similar it is to them. Our experiments on measuring frame interestingness of videos from YouTube using photos from Flickr show the initial success of our method.

[1]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Tie-Yan Liu,et al.  BrowseRank: letting web users vote for page importance , 2008, SIGIR '08.

[4]  Bryan Peterson,et al.  Learning to See Creatively , 1988 .

[5]  Scott P. Robertson,et al.  Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 1991 .

[6]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[7]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[8]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[9]  Zhou Wang,et al.  No-reference perceptual quality assessment of JPEG compressed images , 2002, Proceedings. International Conference on Image Processing.

[10]  Frederic T. b. Blanchard,et al.  The art of composition , 1934 .

[11]  Abigail Sellen,et al.  Understanding photowork , 2006, CHI.

[12]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[13]  Steven M. Seitz,et al.  Scene Segmentation Using the Wisdom of Crowds , 2008, ECCV.

[14]  Hanghang Tong,et al.  Blur detection for digital images using wavelet transform , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[15]  Yu Hen Hu,et al.  Discovering panoramas in web videos , 2008, ACM Multimedia.

[16]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.