Fast Content-Based Mining of Web2.0 Videos

The accumulation of many transformed versions of the same original videos on Web2.0 sites has a negative impact on the quality of the results presented to the users and on the management of content by the provider. An automatic identification of such content links between video sequences can address these difficulties. We put forward a fast solution to this video mining problem, relying on a compact keyframe descriptor and an adapted indexing solution. Two versions are developed, an off-line one for mining large databases and an online one to quickly post-process the results of keyword-based interactive queries. After demonstrating the reliability of the method on a ground truth, the scalability on a database of 10,000 hours of video and the speed on 3 interactive queries, some results obtained on Web2.0 content are illustrated.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Suh-Yin Lee,et al.  Recent Advances in Visual Information Systems , 2002, Lecture Notes in Computer Science.

[3]  Shin Satoh News video analysis based on identical shot detection , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[4]  Ton Kalker,et al.  Feature Extraction and a Database Strategy for Video Fingerprinting , 2002, VISUAL.

[5]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Kiyoharu Aizawa,et al.  Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30 - December 3, 2004, Proceedings, Part I , 2005, Pacific Rim Conference on Multimedia.

[8]  Shin'ichi Satoh,et al.  A News Video Browser Using Identical Video Segment Detection , 2004, PCM.

[9]  Mubarak Shah,et al.  Tracking news stories across different sources , 2005, MULTIMEDIA '05.

[10]  Raghav Kaushik,et al.  Efficient exact set-similarity joins , 2006, VLDB.

[11]  Laurent Amsaleg,et al.  Scalability of local image descriptors: a comparative study , 2006, MM '06.

[12]  Shin'ichi Satoh,et al.  Identification and Detection of the Same Scene Based on Flash Light Patterns , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[13]  Olivier Buisson,et al.  Z-grid-based probabilistic retrieval for scaling up content-based copy detection , 2007, CIVR '07.

[14]  Jun Adachi,et al.  Scene duplicate detection from videos based on trajectories of feature points , 2007, MIR '07.

[15]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[16]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[17]  Chong-Wah Ngo,et al.  Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts , 2007, ACM Multimedia.

[18]  Chong-Wah Ngo,et al.  Near-duplicate keyframe retrieval with visual keywords and semantic context , 2007, CIVR '07.

[19]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.