论文信息 - Automatic Detection of High Vocal Effort in Telephone Speech

Automatic Detection of High Vocal Effort in Telephone Speech

A system is proposed for the automatic detection of high vocal effort in speech. The system is evaluated using both PCMcoded speech and AMR-coded telephone speech. In addition, the effect of far-end noise in the telephone conditions is studied using both matched-condition training and cases with additive noise mismatch. The proposed system is based on Bayesian classification of mel-frequency cepstral feature vectors. Concerning the MFCC feature extraction process, the substitution of a spectrum analysis method emphasizing the fine structure improves the results in the noisy cases.

Paavo Alku | Tuomo Raitio | Jouni Pohjalainen | Hannu Pulakka

[1] Paavo Alku,et al. Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[2] M.G. Bellanger,et al. Digital processing of speech signals , 1980, Proceedings of the IEEE.

[3] Colleen Richey,et al. Effects of vocal effort and speaking style on text-independent speaker verification , 2008, INTERSPEECH.

[4] Paavo Alku,et al. Noise Robust Feature Extraction Based on Extended Weighted Linear Prediction in LVCSR , 2011, INTERSPEECH.

[5] Milan Sigmund,et al. Automatic vocal effort detection for reliable speech recognition , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[6] H. Traunmüller,et al. Acoustic effects of variation in vocal effort by men, women, and children. , 2000, The Journal of the Acoustical Society of America.

[7] Augusto Sarti,et al. Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[8] Paavo Alku,et al. Shout detection in noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[10] J C Junqua,et al. The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[11] C.-C. Jay Kuo,et al. A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[12] J. Liénard,et al. Effect of vocal effort on spectral properties of vowels. , 1999, The Journal of the Acoustical Society of America.

[13] Alex Acero,et al. Spoken Language Processing , 2001 .

[14] John H. L. Hansen,et al. Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[15] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16] Levent M. Arslan,et al. Automatic Detection of Anger in Human-Human Call Center Dialogs , 2011, INTERSPEECH.

[17] Samy Bengio,et al. A statistical significance test for person authentication , 2004, Odyssey.

[18] B. Ripley,et al. Pattern Recognition , 1968, Nature.

[19] Corinna Harwardt. Comparing the Impact of Raised Vocal Effort on Various Spectral Parameters , 2011, INTERSPEECH.