Detection and representation of scenes in videos

This paper presents a method to perform a high-level segmentation of videos into scenes. A scene can be defined as a subdivision of a play in which either the setting is fixed, or when it presents continuous action in one place. We exploit this fact and propose a novel approach for clustering shots into scenes by transforming this task into a graph partitioning problem. This is achieved by constructing a weighted undirected graph called a shot similarity graph (SSG), where each node represents a shot and the edges between the shots are weighted by their similarity based on color and motion information. The SSG is then split into subgraphs by applying the normalized cuts for graph partitioning. The partitions so obtained represent individual scenes in the video. When clustering the shots, we consider the global similarities of shots rather than the individual shot pairs. We also propose a method to describe the content of each scene by selecting one representative image from the video as a scene key-frame. Recently, DVDs have become available with a chapter selection option where each chapter is represented by one image. Our algorithm automates this objective which is useful for applications such as video-on-demand, digital libraries, and the Internet. Experiments are presented with promising results on several Hollywood movies and one sitcom.

[1]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[2]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Wallapak Tavanapong,et al.  ShotWeave: A Shot Clustering Technique for Story Browsing for Large Video Databases , 2002, EDBT Workshops.

[5]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[6]  Chong-Wah Ngo,et al.  Motion-Based Video Representation for Scene Change Detection , 2004, International Journal of Computer Vision.

[7]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[8]  Sudeep Sarkar,et al.  Analysis of MinCut, average cut and normalized cut measures , 2001 .

[9]  Ramesh C. Jain,et al.  Knowledge-guided parsing in video databases , 1993, Electronic Imaging.

[10]  John R. Kender,et al.  Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[11]  Niels Haering,et al.  A framework for the design of event detectors , 1999 .

[12]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[15]  Mubarak Shah,et al.  A framework for segmentation of talk and game shows , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[17]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  Irena Koprinska,et al.  Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..

[19]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[20]  Sudeep Sarkar,et al.  Supervised Learning of Large Perceptual Organization: Graph Spectral Partitioning and Learning Automata , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[22]  Jean-Marc Odobez,et al.  Spectral Structuring of Home Videos , 2003, CIVR.

[23]  M. Ibrahim Sezan,et al.  A semantic event-detection approach and its application to detecting hunts in wildlife vide , 2000, IEEE Trans. Circuits Syst. Video Technol..

[24]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).