论文信息 - Robust Features for Effective Speech and Music Discrimination

Robust Features for Effective Speech and Music Discrimination

Speech and music discrimination is one of the most important issues for multimedia information retrieval and efficient coding. While many features have been proposed, seldom of which show robustness under noisy condition, especially in telecommunication applications. In this paper two novel features based on real cepstrum are presented to represent essential differences between music and speech: Average Pitch Density (APD), Relative Tonal Power Density (RTPD). Separate histograms are used to prove the robustness of the novel features. Results of discrimination experiments show that these features are more robust than the commonly used features. The evaluation database consists of a reference collection and a set of telephone speech and music recorded in real world.

Jhing-Fa Wang | Zhong-Hua Fu

[1] E. Ambikairajah,et al. Novel Features for Effective Speech and Music Discrimination , 2006, 2006 IEEE International Conference on Engineering of Intelligent Systems.

[2] C.-C. Jay Kuo,et al. Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[3] Georgios Tziritas,et al. A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[4] Nicolás Ruiz-Reyes,et al. Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination , 2007, Eng. Appl. Artif. Intell..

[5] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Michael J. Carey,et al. A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7] Tom E. Bishop,et al. Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.