A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities

Video summarization techniques have been proposed for years to offer people comprehensive understanding of the whole story in the video. Roughly speaking, existing approaches can be classified into the two types: one is static storyboard, and the other is dynamic skimming. However, despite that these traditional methods give brief summaries for users, they still do not provide with a concept-organized and systematic view. In this paper, we present a structural video content browsing system and a novel summarization method by utilizing the four kinds of entities: who, what, where, and when to establish the framework of the video contents. With the assistance of the above-mentioned indexed information, the structure of the story can be built up according to the characters, the things, the places, and the time. Therefore, users can not only browse the video efficiently but also focus on what they are interested in via the browsing interface. In order to construct the fundamental system, we employ maximum entropy criterion to integrate visual and text features extracted from video frames and speech transcripts, generating high-level concept entities. A novel concept expansion method is introduced to explore the associations among these entities. After constructing the relational graph, we exploit graph entropy model to detect meaningful shots and relations, which serve as the indices for users. The results demonstrate that our system can achieve better performance and information coverage.

[1]  John R. Kender,et al.  A method and browser for cross-referenced video summaries , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[2]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.

[3]  Michael G. Christel,et al.  The effect of text in storyboards for video navigation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Ben Upcroft,et al.  Fast re-parameterisation of Gaussian mixture models for robotics applications , 2004 .

[5]  Tsung-Han Tsai,et al.  A robust shot change detection method for content-based retrieval , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[6]  Jean-Marc Odobez,et al.  Video Shot Clustering using Spectral Methods , 2003 .

[7]  Mirella Lapata,et al.  Graph Connectivity Measures for Unsupervised Word Sense Disambiguation , 2007, IJCAI.

[8]  John R. Kender,et al.  Designing an intelligent user interface for instructional video indexing and browsing , 2006, IUI '06.

[9]  Jafar Adibi,et al.  Discovering important nodes through graph entropy the case of Enron email database , 2005, LinkKDD '05.

[10]  Thomas S. Huang,et al.  Automatic Video Annotation by Mining Speech Transcripts , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[11]  Ganesh Ramesh,et al.  Multi-source combined-media video tracking for summarization , 2002, Object recognition supported by user interaction for service robots.

[12]  Thomas S. Huang,et al.  Exploring video structure beyond the shots , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[13]  Lisa Singh,et al.  Visual Mining of Multi-Modal Social Networks at Different Abstraction Levels , 2007, 2007 11th International Conference Information Visualization (IV '07).

[14]  Chia-Hung Yeh,et al.  Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques , 2006, IEEE Signal Processing Magazine.

[15]  Joo-Hwee Lim,et al.  An integrated statistical model for multimedia evidence combination , 2007, ACM Multimedia.

[16]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Shih-Fu Chang,et al.  Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[18]  Ba Tu Truong,et al.  New enhancements to cut, fade, and dissolve detection processes in video segmentation , 2000, ACM Multimedia.

[19]  Hugo Liu,et al.  Unpacking Meaning from Words: A Context-Centered Approach to Computational Lexicon Design , 2003, CONTEXT.

[20]  Harriet J. Nock,et al.  Semantic annotation of multimedia using maximum entropy models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[22]  Ba Tu Truong,et al.  Generating comprehensible summaries of rushes sequences based on robust feature matching , 2007, TVS '07.

[23]  David C. Gibbon,et al.  Generating hypermedia documents from transcriptions of television programs using parallel text alignment , 1998, Proceedings Eighth International Workshop on Research Issues in Data Engineering. Continuous-Media Databases and Applications.

[24]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[25]  Aggelos K. Katsaggelos,et al.  MINMAX optimal video summarization , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Jianping Fan,et al.  Hierarchical video content description and summarization using unified semantic and visual similarity , 2003, Multimedia Systems.

[27]  George A. Miller,et al.  Nouns in WordNet: A Lexical Inheritance System , 1990 .

[28]  Ted Pedersen,et al.  SenseRelate: : TargetWord-A Generalized Framework for Word Sense Disambiguation , 2005, ACL.

[29]  Jianying Hu,et al.  Combined-media video tracking for summarization , 2001, MULTIMEDIA '01.

[30]  Michael G. Christel Supporting video library exploratory search: when storyboards are not enough , 2008, CIVR '08.

[31]  Jean-Marc Odobez,et al.  Spectral Structuring of Home Videos , 2003, CIVR.

[32]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[35]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[36]  Christos Faloutsos,et al.  MMSS: multi-modal story-oriented video summarization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[37]  Dennis McLeod,et al.  Retrieval effectiveness of an ontology-based model for information selection , 2004, The VLDB Journal.

[38]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[39]  Lexing Xie,et al.  Modeling personal and social network context for event annotation in images , 2007, JCDL '07.

[40]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[41]  John R. Kender,et al.  VAST MM: multimedia browser for presentation video , 2007, CIVR '07.

[42]  Andreas Girgensohn,et al.  Keyframe-Based User Interfaces for Digital Video , 2001, Computer.

[43]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[44]  Zygmunt Pizlo,et al.  Automated video program summarization using speech transcripts , 2006, IEEE Transactions on Multimedia.

[45]  I. King,et al.  A Novel Video Summarization Framework for Document Preparation and Archival Applications , 2005, 2005 IEEE Aerospace Conference.

[46]  Yuxin Peng,et al.  Clip-based similarity measure for query-dependent clip retrieval and video summarization , 2006, IEEE Trans. Circuits Syst. Video Technol..

[47]  Katashi Nagao,et al.  Annotation-Based Multimedia Summarization and Translation , 2002, COLING.

[48]  David Taniar,et al.  Adaptive estimated maximum-entropy distribution model , 2007, Inf. Sci..

[49]  Christophe Marsala,et al.  Video rushes summarization by adaptive acceleration and stacking of shots , 2007, TVS '07.

[50]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.