A novel efficient approach for audio segmentation

In this paper, a novel approach to audio segmentation is presented. The problem of detecting audio segmentspsila limits is treated as a binary classification task. Frames are classified as ldquosegment limitsrdquo vs ldquononsegment limitsrdquo. For each audio frame a spectrogram is computed and eight feature values are extracted from respective frequency bands. Final decisions are taken based on a classifier combination scheme. The algorithm has very low complexity with almost real time performance. It achieves 86% accuracy rate on real audio streams extracted from movies. Moreover, it introduces a general framework to audio segmentation, which does not depend explicitly on the number of audio classes.

[1]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[2]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[3]  Tara N. Sainath,et al.  Unsupervised Audio Segmentation using Extended Baum-Welch Transformations , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Chung-Hsien Wu,et al.  Multiple change-point audio segmentation and classification using an MDL-based Gaussian model , 2006, IEEE Trans. Speech Audio Process..

[6]  Mohamed Kamal Omar,et al.  Blind change detection for audio segmentation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..