Video scene segmentation via continuous video coherence

In extended video sequences, individual frames are grouped into shots which are defined as a sequence taken by a single camera, and related shots are grouped into scenes which are defined as a single dramatic event taken by a small number of related cameras. This hierarchical structure is deliberately constructed, dictated by the limitations and preferences of the human visual and memory systems. We present three novel high-level segmentation results derived from these considerations, some of which are analogous to those involved in the perception of the structure of music. First and primarily, we derive and demonstrate a method for measuring probable scene boundaries, by calculating a short term memory-based model of shot-to-shot "coherence". The detection of local minima in this continuous measure permits robust and flexible segmentation of the video into scenes, without the necessity for first aggregating shots into clusters. Second, and independently of the first, we then derive and demonstrate a one-pass on-the-fly shot clustering algorithm. Third, we demonstrate partially successful results on the application of these two new methods to the next higher, "theme", level of video structure.

[1]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[2]  W. R. Garner The Processing of Information and Structure , 1974 .

[3]  E. Callenbach Grammar of the Film Language . Daniel Arjon. , 1993 .

[4]  A. Lippman,et al.  A Bayesian video modeling framework for shot segmentation and content characterization , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[5]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[6]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[7]  Boon-Lock Yeo,et al.  Video browsing using clustering and scene transitions on compressed sequences , 1995, Electronic Imaging.