A Unified Framework for Video Summarization, Browsing, and Retrieval

This chapter reviews and discusses recent research progress in multimodal analysis, representation, summarization, browsing, and retrieval. It introduces the video table of contents (ToC), the highlights, and the index, and presents techniques for constructing them. It further proposes a unified framework for video summarization, browsing, and retrieval to enable a user to go back and forth between browsing and retrieval. An essential part of the unified framework is composed of the weighted links. The links can be established between index entities and scenes, groups, shots, and key frames in the ToC structure for scripted content and between index entities and finer-resolution highlights, highlight candidates, audio-visual markers, and plays/breaks. For scripted content, focus is given on the links between index entities and shots. Shots are the building blocks of the ToC. An example of going from the visual index to the highlights is shown for unscripted content. This chapter recapitulates the key components of video highlights extraction and video retrieval. Video retrieval is concerned with how to return similar video clips to a user given a video query.

[1]  Ramesh C. Jain,et al.  Knowledge-guided parsing in video databases , 1993, Electronic Imaging.

[2]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[3]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, Electronic Imaging.

[4]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[5]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[7]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[8]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[9]  Ramesh C. Jain,et al.  Dynamic vision , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[10]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[11]  Ramesh C. Jain,et al.  Digital video segmentation , 1994, MULTIMEDIA '94.

[12]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.

[13]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[14]  Thomas S. Huang,et al.  Exploring video structure beyond the shots , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[15]  Ramin Zabih,et al.  A feature-based algorithm for detecting and classifying scene breaks , 1995, MULTIMEDIA '95.

[16]  Arding Hsu,et al.  Feature management for large video databases , 1993, Electronic Imaging.

[17]  Boon-Lock Yeo Efficient processing of compressed images and video , 1996 .

[18]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[19]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[20]  Regunathan Radhakrishnan,et al.  Effective and efficient sports highlights extraction using the minimum description length criterion in selecting GMM structures , 2004, ICME.

[21]  Boon-Lock Yeo,et al.  Extracting story units from long programs for video browsing and navigation , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[22]  Stephen W. Smoliar,et al.  Developing power tools for video indexing and retrieval , 1994, Electronic Imaging.

[23]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[24]  Ralph M. Ford,et al.  Metrics for scene change detection in digital video sequences , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[25]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[26]  Surya Nepal,et al.  Automatic detection of 'Goal' segments in basketball videos , 2001, MULTIMEDIA '01.

[27]  Wenjun Zeng,et al.  Integrated image and speech analysis for content-based video indexing , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[28]  Yoshinao Aoki,et al.  Indexing of baseball telecast for content-based video retrieval , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[29]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[30]  Michal Irani,et al.  Video indexing based on mosaic representations , 1998, Proc. IEEE.

[31]  Stephen W. Smoliar,et al.  Content-based video browsing tools , 1995, Electronic Imaging.

[32]  Chong-Wah Ngo,et al.  Analysis of spatiotemporal slices for video content representation , 2000 .

[33]  Regunathan Radhakrishnan,et al.  Video Summarization Using Mpeg-7 Motion Activity and Audio Descriptors , 2003 .

[34]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[35]  Shih-Fu Chang,et al.  Structure analysis of soccer video with domain knowledge and hidden Markov models , 2004, Pattern Recognit. Lett..

[36]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.