Hierarchical video summarization based on context clustering

A personalized video summary is dynamically generated in our video personalization and summarization system based on user preference and usage environment. The three-tier personalization system adopts the server-middleware-client architecture in order to maintain, select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. In this paper, the metadata includes visual semantic annotations and automatic speech transcriptions. Our personalization and summarization engine in the middleware selects the optimal set of desired video segments by matching shot annotations and sentence transcripts with user preferences. Besides finding the desired contents, the objective is to present a coherent summary. There are diverse methods for creating summaries, and we focus on the challenges of generating a hierarchical video summary based on context information. In our summarization algorithm, three inputs are used to generate the hierarchical video summary output. These inputs are (1) MPEG-7 metadata descriptions of the contents in the server, (2) user preference and usage environment declarations from the user client, and (3) context information including MPEG-7 controlled term list and classification scheme. In a video sequence, descriptions and relevance scores are assigned to each shot. Based on these shot descriptions, context clustering is performed to collect consecutively similar shots to correspond to hierarchical scene representations. The context clustering is based on the available context information, and may be derived from domain knowledge or rules engines. Finally, the selection of structured video segments to generate the hierarchical summary efficiently balances between scene representation and shot selection.

[1]  M. Smith,et al.  Video Skimming for Quick Browsing based on Audio and Image Characterization , 1995 .

[2]  Rainer Lienhart,et al.  Abstracting home video automatically , 1999, MULTIMEDIA '99.

[3]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[4]  Alexander G. Hauptmann,et al.  Adjustable filmstrips and skims as abstractions for a digital video library , 1999, Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries.

[5]  Yukinobu Taniguchi,et al.  PanoramaExcerpts: extracting and packing panoramas for video browsing , 1997, MULTIMEDIA '97.

[6]  Ching-Yung Lin,et al.  Personalized video summary using visual semantic annotations and automatic speech transcriptions , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[7]  Kiyoharu Aizawa,et al.  Summarizing wearable video , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[8]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9]  Boon-Lock Yeo,et al.  Video browsing using clustering and scene transitions on compressed sequences , 1995, Electronic Imaging.

[10]  Jianying Hu,et al.  Combined-media video tracking for summarization , 2001, MULTIMEDIA '01.

[11]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[12]  Osamu Hori,et al.  A shot classification method of selecting effective key-frames for video browsing , 1997, MULTIMEDIA '96.

[13]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[14]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.

[15]  Bernard Mérialdo,et al.  Automatic construction of personalized TV news programs , 1999, MULTIMEDIA '99.

[16]  Michal Irani,et al.  Video indexing based on mosaic representations , 1998, Proc. IEEE.

[17]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[18]  John R. Smith,et al.  Universal MPEG content access using compressed-domain system stream editing techniques , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[19]  Jeho Nam,et al.  Dynamic video summarization and visualization , 1999, MULTIMEDIA '99.

[20]  John R. Smith,et al.  Video summarization and personalization for pervasive mobile devices , 2001, IS&T/SPIE Electronic Imaging.