Hierarchical video content description and summarization using unified semantic and visual similarity

Abstract.Video is increasingly the medium of choice for a variety of communication channels, resulting primarily from increased levels of networked multimedia systems. One way to keep our heads above the video sea is to provide summaries in a more tractable format. Many existing approaches are limited to exploring important low-level feature related units for summarization. Unfortunately, the semantics, content and structure of the video do not correspond to low-level features directly, even with closed-captions, scene detection, and audio signal processing. The drawbacks of existing methods are the following: (1) instead of unfolding semantics and structures within the video, low-level units usually address only the details, and (2) any important unit selection strategy based on low-level features cannot be applied to general videos. Providing users with an overview of the video content at various levels of summarization is essential for more efficient database retrieval and browsing. In this paper, we present a hierarchical video content description and summarization strategy supported by a novel joint semantic and visual similarity strategy. To describe the video content efficiently and accurately, a video content description ontology is adopted. Various video processing techniques are then utilized to construct a semi-automatic video annotation framework. By integrating acquired content description data, a hierarchical video content structure is constructed with group merging and clustering. Finally, a four layer video summary with different granularities is assembled to assist users in unfolding the video content in a progressive way. Experiments on real-word videos have validated the effectiveness of the proposed approach.

[1]  Nuno Vasconcelos,et al.  A Bayesian framework for content-based indexing and retrieval , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[2]  Luo Si,et al.  M3IC: Maximum Margin Multiple Instance Clustering , 2009, IJCAI.

[3]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[5]  Xindong Wu,et al.  Sequential association mining for video summarization , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6]  Michael G. Christel Visual digests for news video libraries , 1999, MULTIMEDIA '99.

[7]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[8]  Lide Wu,et al.  Image retrieval and semiautomatic annotation scheme for large image databases on the Web , 2000, IS&T/SPIE Electronic Imaging.

[9]  Wolfgang Effelsberg,et al.  Abstracting Digital Movies Automatically , 1996, J. Vis. Commun. Image Represent..

[10]  Jianping Fan,et al.  Spatiotemporal segmentation for compact video representation , 2001, Signal Process. Image Commun..

[11]  Jing Xiao,et al.  Content-Based Video Indexing and Retrieval , 2004 .

[12]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[13]  Stefanos D. Kollias,et al.  Efficient summarization of stereoscopic video sequences , 2000, IEEE Trans. Circuits Syst. Video Technol..

[14]  Lee Spector,et al.  Ontology-Based Knowledge Discovery on the World-Wide Web , 1996 .

[15]  R. Brunelli,et al.  A Survey on Video Indexing , 1996 .

[16]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[17]  Vipul Kashyap,et al.  Domain Specific Ontologies for Semantic Information Brokering on the Global Information Infrastructure , 1998 .

[18]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[19]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[20]  Ahmed K. Elmagarmid,et al.  VideoText database systems , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[21]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[22]  Alan P. Parkes,et al.  Computer-Controlled Video for Intelligent Interactive Use: a Description Methodology , 1992 .

[23]  Lalitha Agnihotri,et al.  An architecture for video content filtering in consumer domain , 2000, Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540).

[24]  Marc Davis,et al.  Media Streams: an iconic visual language for video annotation , 1993, Proceedings 1993 IEEE Symposium on Visual Languages.

[25]  Alexander G. Hauptmann,et al.  Adjustable filmstrips and skims as abstractions for a digital video library , 1999, Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries.

[26]  HongJiang Zhang,et al.  Automatic video scene extraction by shot grouping , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[27]  Gilles R. Bloch From Concepts to Film Sequences , 1988, RIAO.

[28]  M. Ibrahim Sezan,et al.  Hierarchical video summarization , 1998, Electronic Imaging.

[29]  Wenyin Liu,et al.  New query refinement and semantics integrated image retrieval system with semiautomatic annotation scheme , 2001, J. Electronic Imaging.

[30]  Shingo Uchihashi,et al.  Video Manga: generating semantically meaningful video summaries , 1999, MULTIMEDIA '99.

[31]  Willem Jonker,et al.  An Overview of Data Models and Query Languages for Content-based Video Retrieval , 2000 .

[32]  Qiang Yang,et al.  A unified framework for semantics and feature based relevance feedback in image retrieval systems , 2000, ACM Multimedia.

[33]  Wolfgang Effelsberg,et al.  Video abstracting , 1997, CACM.

[34]  Menzo Windhouwer,et al.  The Role of High-Level and Low-Level Features in Semi-Automated Retrieval and Generation of Multimed , 1988 .

[35]  Glorianna Davenport,et al.  The Stratification System A Design Environment for Random Access Video , 2005 .

[36]  Glorianna Davenport,et al.  The Stratification System - A Design Emvironment for Random Access , 1992, NOSSDAV.

[37]  Andreas Girgensohn,et al.  Video classification using transform coefficients , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[38]  M. Ibrahim Sezan,et al.  Detecting hunts in wildlife videos , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[39]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[40]  Menzo Windhouwer,et al.  The role of high-level and low-level features in style-based retrieval and generation of multimedia presentations , 2001, New Rev. Hypermedia Multim..

[41]  Jianping Fan,et al.  A distributed database server for continuous media , 2002, Proceedings 18th International Conference on Data Engineering.

[42]  Jianping Fan,et al.  Hierarchical video summarization for medical data , 2001, IS&T/SPIE Electronic Imaging.

[43]  Qiang Yang,et al.  A Unified Semantics and Feature Based Image Retrieval Technique Using Relevance Feedback , 2000 .

[44]  Rune Hjelsvold,et al.  Modelling and Querying Video Data , 1994, VLDB.

[45]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.

[46]  Andrzej Duda,et al.  Content-based access to algebraic video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[47]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Jenq-Neng Hwang,et al.  An integrated scheme for object-based video abstraction , 2000, ACM Multimedia.

[49]  Jeho Nam,et al.  Dynamic video summarization and visualization , 1999, MULTIMEDIA '99.

[50]  Svetha Venkatesh,et al.  Bridging the Semantic Gap in Content Management Systems , 2002 .

[51]  Jianping Fan,et al.  MultiView: Multilevel video content representation and retrieval , 2001, J. Electronic Imaging.

[52]  M. L. Kersten,et al.  Acoi: A system for Indexing Multimedia Objects , 1999 .

[53]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[54]  Jianping Fan,et al.  Automatic model-based semantic object extraction algorithm , 2001, IEEE Trans. Circuits Syst. Video Technol..

[55]  Glorianna Davenport,et al.  ConText towards the evolving documentary , 1995, MULTIMEDIA '95.

[56]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[57]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[58]  Thomas R. Gruber,et al.  Ontolingua: a mechanism to support portable ontologies , 1991 .

[59]  Rainer Lienhart,et al.  Abstracting home video automatically , 1999, MULTIMEDIA '99.

[60]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[61]  Andreas Dieberger,et al.  Hierarchical brushing in a collection of video data , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[62]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.