Boosting Speech/Non-speech Classification Using Averaged Mel-Frequency Cepstrum Coefficients Features

AdaBoost is used to boost and select the best sequence of weak classifiers for the speech/non-speech classification. These weak classifiers are chosen the simple threshold functions. Statistical mean and variance of the Mel-frequency Cepstrum Coefficients(MFCC) over all overlapping frames of an audio file are used as audio features. Training and testing on a database of 410 audio files have shown asymptotic classification improvement by AdaBoost. A classification accuracy of 99.51% has been achieved on the test data. A comparison of AdaBoost with Nearest Neighbor and Nearest Center classifiers is also given.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Erling H. Wold,et al.  Content-Based Search, and Retrieval of Audio , 1996 .

[3]  Srinivas Bangalore,et al.  Combining prior knowledge and boosting for call classification in spoken language dialogue , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[5]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[6]  Stan Z. Li,et al.  Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method , 2000 .

[7]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[8]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[10]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[11]  Guodong Guo,et al.  Boosting for content-based audio classification and retrieval: an evaluation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.