EMS: Energy Minimization Based Video Scene Segmentation

This paper proposes a novel energy minimization based approach to video scene segmentation. In video content analysis, scene is defined as a set of adjacent shots related to a particular setting or a continuous action in one place. This indicates that not only the global distribution of time and content, but also the local temporal continuity should be taken into account for scene segmentation. Motivated from this fact, we formulate the segmentation procedure as a unified energy minimization framework, in which the global and local constraint is represented by content and context energy, respectively. This energy minimization problem is optimized by two steps in an iterative fashion: first find an initial estimation of scene label for content energy by a generative model; and then iterated conditional modes (ICM) is used for context energy to find the global optimization. Furthermore, a boundary voting procedure is devised to decide the optimal scene boundaries. We apply EMS on an extensive set of home videos and feature movies, and report superior performance compared with several existing key approaches to scene segmentation.

[1]  Mubarak Shah,et al.  A general framework for temporal video scene segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[3]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[4]  Li Zhao,et al.  Video shot grouping using best-first model merging , 2001, IS&T/SPIE Electronic Imaging.

[5]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Tao Mei,et al.  Probabilistic Multimodality Fusion for Event based Home Photo Clustering , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[8]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[9]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Yap-Peng Tan,et al.  Model-based clustering and analysis of video scenes , 2002, Proceedings. International Conference on Image Processing.