Automatic construction of multi-document multimedia summaries

With the increase of video quantity on the Internet, multimedia information processing has been a focused topic in the recent years. Among the techniques for multimedia information processing, video summarization has become an important tool. Some successful approaches have been proposed by the researchers in multimedia community. In this thesis, we propose our novel video summarization algorithm, Video-MMR (Video Maximal Marginal Relevance) based on visual information by mimicking MMR (Maximal Marginal Relevance) in text summarization. Video-MMR is a generic algorithm regardless of the video genre and suitable for summarizing both a single video and a set of videos. Besides Video-MMR as our basis, we also develop our approaches as following: 1. Visual information is always the most important compared to acoustic and textual information. So we overcome limits of Video-MMR and propose a refinement, VideoMMR2, by only exploiting visual information. 2. Since in a video, visual information is only one of several cues, more variants of VideoMMR using multimedia cues are proposed by exploiting multimedia information such as text or audio. We extend Video-MMR to AV-MMR (Audio Video Maximal Marginal Relevance), Balanced AV-MMR, OB-MMR (Optimized Balanced Audio Video Maximal Marginal Relevance) and TV-MMR (Text Video Maximal Marginal Relevance). These multimedia MMR algorithms are generic algorithms which outperform Video-MMR if we take into account the text and audio information in the video. 3. In addition to the summarization algorithms, we also optimize the presentation of video summaries, otherwise a good summary can be corrupted by a bad presentation. So we try optimizing a static summary containing keyframes and keywords by suggesting the number of frames and text grams, and dynamic summary composed of video segments by optimizing average duration of segments. 4. In the domain of video summarization, we need an evaluation measure for new approaches. Many current measures are based on human assessment, and the automatic evaluation method for video summaries is still an open problem. In this thesis we propose an approach, VERT (Video Evaluation by Relevant Threshold) mimicking the evaluation measures BLEU and ROUGE in text community to facilitate the automatic evaluation procedure with the help of only a few human assessments. We describe the details of all the approaches and present experimental results. Therefore, a framework on video summarization is proposed, including an algorithm of video summarization using visual cue, its variants exploiting more multimedia cues, an optimization measure of summary presentation, and a new evaluation method of video summaries. It allows us to manage and browse multiple videos more efficiently.

[1]  Chung-Lin Huang,et al.  MSN: statistical understanding of broadcasted baseball video using multi-level semantic network , 2005, IEEE Transactions on Broadcasting.

[2]  Berna Erol,et al.  Multimodal summarization of meeting recordings , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  S. Impedovo,et al.  Optical Character Recognition - a Survey , 1991, Int. J. Pattern Recognit. Artif. Intell..

[4]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[5]  Jonathan Foote,et al.  Summarizing video using non-negative similarity matrix factorization , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[6]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[7]  Georges Linarès,et al.  Static and dynamic video summaries , 2011, MM '11.

[8]  Wei-Ying Ma,et al.  Video summarization based on user log enhanced link analysis , 2003, ACM Multimedia.

[9]  Paul Over,et al.  The trecvid 2007 BBC rushes summarization evaluation pilot , 2007, TVS '07.

[10]  Roger Mohr,et al.  A probabilistic framework of selecting effective key frames for video browsing and indexing , 2000 .

[11]  Bernard Mérialdo,et al.  Multi-video summarization based on AV-MMR , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[12]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[13]  Yue Gao,et al.  Dynamic video summarization using two-level redundancy detection , 2009, Multimedia Tools and Applications.

[14]  Jenny Benois-Pineau,et al.  The COST292 experimental framework for rushes summarization task in TRECVID 2008 , 2008, TVS '08.

[15]  Bernard Mérialdo,et al.  Evaluation of video summaries , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[16]  Thomas Fang Zheng,et al.  Comparison of different implementations of MFCC , 2001, Journal of Computer Science and Technology.

[17]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[18]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  Stefanos D. Kollias,et al.  Video content representation using optimal extraction of frames and scenes , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[21]  V. Ghini,et al.  An audio-video summarization scheme based on audio and video analysis , 2006, CCNC 2006. 2006 3rd IEEE Consumer Communications and Networking Conference, 2006..

[22]  Riccardo Poli,et al.  Particle Swarm Optimisation , 2011 .

[23]  Masaru Sugano,et al.  Automated MPEG audio-video summarization and description , 2002, Proceedings. International Conference on Image Processing.

[24]  Stevan Rudinac,et al.  Finding representative and diverse community contributed images to create visual summaries of geographic areas , 2011, MM '11.

[25]  Lie Lu,et al.  Highlight sound effects detection in audio stream , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[26]  V. S. Subrahmanian,et al.  The priority curve algorithm for video summarization , 2006, Inf. Syst..

[27]  Jenny Benois-Pineau,et al.  Clustering of scene repeats for essential rushes preview , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[28]  Masaharu Ogawa,et al.  A highlight scene detection and video summarization system using audio feature for a personal video recorder , 2005, IEEE Transactions on Consumer Electronics.

[29]  Patrick Bouthemy,et al.  Tennis video abstraction from audio and visual cues , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[30]  Kiyoharu Aizawa,et al.  Evaluation of video summarization for a large number of cameras in ubiquitous home , 2005, MULTIMEDIA '05.

[31]  Mohan S. Kankanhalli,et al.  Creating audio keywords for event detection in soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[32]  Nevenka Dimitrova Context and Memory in Multimedia Content Analysis , 2004, IEEE Multim..

[33]  Paul Over,et al.  The trecvid 2008 BBC rushes summarization evaluation , 2008, TVS '08.

[34]  Michael R. Lyu,et al.  Video summarization by spatial-temporal graph optimization , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[35]  Bernard Mérialdo,et al.  Automatic evaluation method for rushes summary content , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[36]  Shih-Fu Chang,et al.  Event detection in baseball video using superimposed caption recognition , 2002, MULTIMEDIA '02.

[37]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[38]  Michael G. Christel Evaluation and user studies with respect to video summarization and browsing , 2006, Electronic Imaging.

[39]  Bernard Mérialdo,et al.  VERT: automatic evaluation of video summaries , 2010, ACM Multimedia.

[40]  Jun Xiao,et al.  iSlideShow: a content-aware slideshow system , 2010, IUI '10.

[41]  Mohan S. Kankanhalli,et al.  Automatic music video summarization based on audio-visual-text analysis and alignment , 2005, SIGIR '05.

[42]  Regunathan Radhakrishnan,et al.  Time series analysis and segmentation using eigenvectors for mining semantic audio label sequences , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[43]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[44]  A. Murat Tekalp,et al.  Two-stage hierarchical video summary extraction to match low-level user browsing preferences , 2003, IEEE Trans. Multim..

[45]  Janko Calic,et al.  Efficient Layout of Comic-Like Video Summaries , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Gary Marchionini,et al.  Key frame preview techniques for video browsing , 1998, DL '98.

[47]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[48]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[49]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[50]  Howard D. Wactlar,et al.  Multi-Document Summarization and Visualization in the Informedia Digital Video Library , 2001 .

[51]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[52]  Klaus Schöffmann,et al.  Video Browsing Using Interactive Navigation Summaries , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[53]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[54]  Ching-Yung Lin,et al.  Optimizing user expectations for video semantic filtering and abstraction , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[55]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[56]  Shingo Uchihashi,et al.  Video Manga: generating semantically meaningful video summaries , 1999, MULTIMEDIA '99.

[57]  Michael R. Lyu,et al.  Video summarization by video structure analysis and graph optimization , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[58]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[59]  Edward J. Delp,et al.  Automated video summarization using speech transcripts , 2001, IS&T/SPIE Electronic Imaging.

[60]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[61]  Bernard Mérialdo,et al.  Multi-video summarization based on Video-MMR , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[62]  Martha Larson,et al.  Frontiers in multimedia search , 2011, MM '11.

[63]  Benoit Huet,et al.  Automatic video summarization , 2006 .

[64]  Rainer Lienhart Dynamic video summarization of home video , 1999, Electronic Imaging.

[65]  Ingvar Claesson,et al.  Face Detection using Local SMQT Features and Split up Snow Classifier , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[66]  Sang Uk Lee,et al.  Efficient video indexing scheme for content-based retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[67]  Frank M. Shipman,et al.  Creating navigable multi-level video summaries , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[68]  Cuneyt M. Taskiran Evaluation of automatic video summarization systems , 2006, Electronic Imaging.

[69]  Bernard Mérialdo,et al.  Video Summarization Based on Balanced AV-MMR , 2012, MMM.

[70]  Mauro Barbieri,et al.  Video summarization: methods and landscape , 2003, SPIE ITCom.

[71]  Bernard Mérialdo,et al.  Multi-document video summarization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[72]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[73]  Yingbo Li Visualization of multi-video summaries demonstration , 2009 .

[74]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[75]  Fumiko Satoh,et al.  Learning personalized video highlights from detailed MPEG-7 metadata , 2002, Proceedings. International Conference on Image Processing.

[76]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[77]  Bernard Mérialdo,et al.  Multi-video summarization based on OB-MMR , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).