Video segmentation with the assistance of audio content analysis

Video structure extraction is essential to automatic content-based organization, retrieval and browsing of video. However, while many robust shot segmentation algorithms have been developed, it is still difficult to extract scene structures or group shots into scenes. We present a novel audio assisted video segmentation scheme, in which audio and color information is integrated in video scene extraction. A novel audio segmentation scheme is developed to segment audio tracks into speech, music, environmental sound and silence segments. A robust algorithm for shot grouping based on correlation analysis is also developed to further enhance the scene extraction accuracy.

[1]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[3]  C.-C. Jay Kuo,et al.  Video content parsing based on combined audio and visual information , 1999, Optics East.

[4]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[5]  Dragutin Petkovic,et al.  Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review , 1996 .

[6]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).