Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification

With the explosive growth of web videos on the Internet, it becomes challenging to efficiently browse hundreds or even thousands of videos. When searching an event query, users are often bewildered by the vast quantity of web videos returned by search engines. Exploring such results will be time consuming and it will also degrade user experience. In this paper, we present an approach for event driven web video summarization by tag localization and key-shot mining. We first localize the tags that are associated with each video into its shots. Then, we estimate the relevance of the shots with respect to the event query by matching the shot-level tags with the query. After that, we identify a set of key-shots from the shots that have high relevance scores by exploring the repeated occurrence characteristic of key sub-events. Following the scheme in [6] and [22], we provide two types of summaries, i.e., threaded video skimming and visual-textual storyboard. Experiments are conducted on a corpus that contains 60 queries and more than 10 000 web videos. The evaluation demonstrates the effectiveness of the proposed approach.

[1]  Tat-Seng Chua,et al.  Video reference: question answering on YouTube , 2009, MM '09.

[2]  Meng Wang,et al.  ShotTagger: tag location for internet videos , 2011, ICMR.

[3]  Tat-Seng Chua,et al.  Exploring large scale data for multimedia QA: an initial study , 2010, CIVR '10.

[4]  Min Chen,et al.  Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework , 2008, IEEE Transactions on Multimedia.

[5]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[6]  Chong-Wah Ngo,et al.  Threading and Autodocumenting News Videos , 2006 .

[7]  Tat-Seng Chua,et al.  From text question-answering to multimedia QA on web-scale media resources , 2009, LS-MMRM '09.

[8]  Hung-Khoon Tan,et al.  Beyond search: Event-driven summarization for web videos , 2011, TOMCCAP.

[9]  Tat-Seng Chua,et al.  Mediapedia: Mining Web Knowledge to Construct Multimedia Encyclopedia , 2010, MMM.

[10]  Yue-Shi Lee,et al.  CLVQ: cross-language video question/answering system , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[11]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[13]  Xuelong Li,et al.  Modality Mixture Projections for Semantic Video Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Jianping Fan,et al.  Hierarchical video content description and summarization using unified semantic and visual similarity , 2003, Multimedia Systems.

[15]  Hung-Khoon Tan,et al.  Event driven summarization for web videos , 2009, WSM '09.

[16]  Meng Wang,et al.  Semi-supervised kernel density estimation for video annotation , 2009, Comput. Vis. Image Underst..

[17]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[18]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[19]  Jhing-Fa Wang,et al.  A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities , 2009, IEEE Transactions on Multimedia.

[20]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[21]  Bernard Mérialdo,et al.  Multi-document video summarization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[22]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[23]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[24]  Ramakant Nevatia,et al.  Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Xian-Sheng Hua,et al.  Finding image exemplars using fast sparse affinity propagation , 2008, ACM Multimedia.

[26]  Jintao Li,et al.  The use of topic evolution to help users browse and find answers in news video corpus , 2007, ACM Multimedia.

[27]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[29]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[30]  Mark Sanderson,et al.  Automatic video tagging using content redundancy , 2009, SIGIR.

[31]  Grace Hui Yang,et al.  Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.

[32]  Jay F. Nunamaker,et al.  Question answering on lecture videos: a multifaceted approach , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[33]  Topical summarization of web videos by visual-text time-dependent alignment , 2010, ACM Multimedia.

[34]  Stefan M. Rüger,et al.  NNk Networks for Content-Based Image Retrieval , 2004, ECIR.

[35]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[36]  Hung-Khoon Tan,et al.  Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning , 2007, IEEE Transactions on Multimedia.

[37]  Adrian Ulges,et al.  Identifying relevant frames in weakly labeled videos for training concept detectors , 2008, CIVR '08.

[38]  Benoit Huet,et al.  Automatic video summarization , 2006 .

[39]  Jialie Shen,et al.  Personalized video similarity measure , 2011, Multimedia Systems.

[40]  Bernard Mérialdo,et al.  Multi-video summarization based on AV-MMR , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[41]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[42]  David A. Forsyth,et al.  Towards auto-documentary: tracking the evolution of news stories , 2004, MULTIMEDIA '04.

[43]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[44]  Jiangchuan Liu,et al.  Understanding the Characteristics of Internet Short Video Sharing: YouTube as a Case Study , 2007, ArXiv.

[45]  Bernard Mérialdo,et al.  Comparison of Multiepisode Video Summarization Algorithms , 2003, EURASIP J. Adv. Signal Process..

[46]  Chong-Wah Ngo,et al.  Threading and autodocumenting news videos: a promising solution to rapidly browse news topics , 2006, IEEE Signal Processing Magazine.

[47]  Sheng Tang,et al.  TRECVID 2008 Participation by MCG-ICT-CAS , 2008, TRECVID.