Classification of Music and Speech in Mandarin News Broadcasts

Audio scene analysis refers to the problem of class ifying segments in a continuous audio stream according to content, e.g. speech versus non-speech, music, ambi ent noise, etc. Techniques that support such autom atic segmentation is indispensable for multimedia information processing. For example, it is a precursor to processes such as indexing of speech segments by automatic speech recognition, automatic story segmentation based on recognition transcript s, speaker diarization, etc. This paper describes our work in the developm ent of a speech/music discriminator for Mandarin broadcast news audio. We formed a high-dimensional feature vector that in cludes LPCC, LPS and STFT coefficients totaling 94 in all. We also experimented with three classifiers - the KNN, SVM and MLP. Experiments based on the Voice of America Mandarin news broadcasts show high classification performance with F-measure=0.98. The SVM also strikes the best balance in terms of classification performance and computation time (re al-time) among the three classifiers. 1

[1]  Belgium , 1930 .

[2]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Bruce K. Bell,et al.  Volume 5 , 1998 .

[4]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Wasfi G. Al-Khatib,et al.  Machine-learning based classification of speech and music , 2006, Multimedia Systems.

[6]  Lei Xie,et al.  Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation , 2007, INTERSPEECH.

[7]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[9]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[11]  Chuan Liu,et al.  Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News , 2007, NAACL.

[12]  Lie Lu,et al.  Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .