Video scene segmentation using Markov chain Monte Carlo

Videos are composed of many shots that are caused by different camera operations, e.g., on/off operations and switching between cameras. One important goal in video analysis is to group the shots into temporal scenes, such that all the shots in a single scene are related to the same subject, which could be a particular physical setting, an ongoing action or a theme. In this paper, we present a general framework for temporal scene segmentation in various video domains. The proposed method is formulated in a statistical fashion and uses the Markov chain Monte Carlo (MCMC) technique to determine the boundaries between video scenes. In this approach, a set of arbitrary scene boundaries are initialized at random locations and are automatically updated using two types of updates: diffusion and jumps. Diffusion is the process of updating the boundaries between adjacent scenes. Jumps consist of two reversible operations: the merging of two scenes and the splitting of an existing scene. The posterior probability of the target distribution of the number of scenes and their corresponding boundary locations is computed based on the model priors and the data likelihood. The updates of the model parameters are controlled by the hypothesis ratio test in the MCMC process, and the samples are collected to generate the final scene boundaries. The major advantage of the proposed framework is two-fold: 1) it is able to find the weak boundaries as well as the strong boundaries, i.e., it does not rely on the fixed threshold; 2) it can be applied to different video domains. We have tested the proposed method on two video domains: home videos and feature films, and accurate results have been obtained

[1]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[2]  John R. Kender,et al.  Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[3]  Mubarak Shah,et al.  Visual Content-Based Segmentation of Talk and Game Shows , 2002 .

[4]  Shih-Fu Chang,et al.  Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[5]  A. Murat Tekalp,et al.  Temporal video segmentation using unsupervised clustering and semantic object tracking , 1998, J. Electronic Imaging.

[6]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[7]  Chong-Wah Ngo,et al.  Motion-Based Video Representation for Scene Change Detection , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[8]  Shrikanth Narayanan,et al.  Movie Content Analysis, Indexing and Skimming Via Multimodal Information , 2003 .

[9]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[10]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[12]  Frank Dellaert,et al.  An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets , 2004, ECCV.

[13]  Zhuowen Tu,et al.  Range image segmentation by an effective jump-diffusion method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Omar Javed,et al.  University of Central Florida at TRECVID 2004 , 2003, TRECVID.

[15]  Keiichiro Hoashi,et al.  Shot Boundary Determination on MPEC Compressed Domain and Story Segmentation Experiments for TRECVID 2003 , 2003, TRECVID.

[16]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[18]  Svetha Venkatesh,et al.  Novel approach to determining tempo and dramatic story sections in motion pictures , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[19]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[20]  Frank Dellaert,et al.  EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence , 2004, Machine Learning.

[21]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Walter R. Gilks,et al.  Bayesian model comparison via jump diffusions , 1995 .

[23]  Alan L. Yuille,et al.  Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Shih-Fu Chang,et al.  Video scene segmentation using video and audio features , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[25]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[26]  Michael I. Miller,et al.  REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[27]  Julien Sénégas A Markov Chain Monte Carlo Approach to Stereovision , 2002, ECCV.

[28]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[29]  Wallapak Tavanapong,et al.  Shot clustering techniques for story browsing , 2004, IEEE Transactions on Multimedia.

[30]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[31]  Liu Huayong,et al.  The segmentation of news video into story units , 2005 .

[32]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[33]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Mubarak Shah,et al.  A Multi-level Framework for Video Shot Structuring , 2005, ICIAR.

[35]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[36]  Boon-Lock Yeo,et al.  Video browsing using clustering and scene transitions on compressed sequences , 1995, Electronic Imaging.