Identification of story units in audio-visual sequences by joint audio and video processing

A novel technique, which uses a joint audio-visual analysis for scene identification and characterization, is proposed. The paper defines four different scene types: dialogues, stories, actions, and generic scenes. It then explains how any audio-visual material can be decomposed into a series of scenes obeying the previous classification, by properly analyzing and then combining the underlying audio and visual information. A rule-based procedure is defined for such purpose. Before such rule-based decision can take place, a series of low-level pre-processing tasks are suggested to adequately measure audio and visual correlations. As far as visual information is concerned, it is proposed to measure the similarities between non-consecutive shots using a learning vector quantization approach. An outlook on a possible implementation strategy for the overall scene identification task is suggested, and validated through a series of experimental simulations on real audio-visual data.

[1]  Nilesh V. Patel,et al.  Statistical approach to scene change detection , 1995, Electronic Imaging.

[2]  Boon-Lock Yeo,et al.  Analysis and synthesis for new digital video applications , 1997, Proceedings of International Conference on Image Processing.

[3]  F. Arman,et al.  A Statistical Approach to Scene Change Detection , 1995 .

[4]  Boon-Lock Yeo,et al.  Video content characterization and compaction for digital library applications , 1997, Electronic Imaging.

[5]  ZhangHongJiang,et al.  Automatic partitioning of full-motion video , 1993 .

[6]  C. Saraceno,et al.  Identification of successive correlated camera shots using audio and video information , 1997, Proceedings of International Conference on Image Processing.

[7]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .