Speech/Music Discrimination using Spectral Peak Feature for Speaker Indexing

We present a new speech/music discrimination method based on spectral peak feature and spectral peak's duration threshold. The focus is feature extraction that reflects the spectral peak's duration characteristic. Also, we consider fast discrimination and high performance. We extract the spectral peak feature from audio spectrum's each peak track and normalize the feature by length of segment. The extracted spectral peak's duration feature can be easily discriminated the speech and music using the duration threshold. We evaluate our method on speech (Korean, English, Chinese and Japanese) and various kinds of pop-music (ballad, rock etc.) for 26,773 seconds of audio data. The average accuracy is 96.21% for speech and 89.49% for music. It was found from the experimental result that our feature vector is suitable for speech/music discrimination and it is computational efficient

[1]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[2]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Chan-Ho Park,et al.  A New Text-Independent Speaker Identification Using Vector Quantization and Multi-layer Perceptron , 2006, ISNN.

[4]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[6]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hyon-Soo Lee,et al.  Speaker Change Detection Based on Spectral Peak Track Analysis for Korean Broadcast News , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[8]  Liming Chen,et al.  Robust speech music discrimination using spectrum's first order statistics and neural networks , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..