A general framework for temporal video scene segmentation

Videos are composed of many shots caused by different camera operations, e.g., on/off operations and switching between cameras. One important goal in video analysis is to group the shots into temporal scenes, such that all the shots in a single scene are related to a particular physical setting, an on-going action or a theme. In this paper, we present a general framework for temporal scene segmentation for various video types. The proposed method is formulated in a statistical fashion and uses the Markov chain Monte Carlo (MCMC) technique to determine the boundaries between video scenes. In this approach, an arbitrary number of scene boundaries are randomly initialized and automatically updated using two types of updates: diffuse and jumps. The posterior probability on the number of scenes and their boundary locations is computed based on the model priors and the data likelihood. The updates of the model parameters are controlled by the hypothesis ratio test in the MCMC process. The proposed framework has been experimented on two types of videos, home videos and feature films, and accurate results have been obtained

[1]  Shih-Fu Chang,et al.  Video scene segmentation using video and audio features , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[2]  Bin Han,et al.  Tight wavelet frames generated by three symmetric B-spline functions with high vanishing moments , 2003 .

[3]  Zhuowen Tu,et al.  Range image segmentation by an effective jump-diffusion method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[5]  Chin-Hui Lee,et al.  The segmentation of news video into story units , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[6]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[7]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Shih-Fu Chang,et al.  Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).