Automatic Detection of High Vocal Effort in Telephone Speech

A system is proposed for the automatic detection of high vocal effort in speech. The system is evaluated using both PCMcoded speech and AMR-coded telephone speech. In addition, the effect of far-end noise in the telephone conditions is studied using both matched-condition training and cases with additive noise mismatch. The proposed system is based on Bayesian classification of mel-frequency cepstral feature vectors. Concerning the MFCC feature extraction process, the substitution of a spectrum analysis method emphasizing the fine structure improves the results in the noisy cases.

[1]  Paavo Alku,et al.  Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[2]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[3]  Colleen Richey,et al.  Effects of vocal effort and speaking style on text-independent speaker verification , 2008, INTERSPEECH.

[4]  Paavo Alku,et al.  Noise Robust Feature Extraction Based on Extended Weighted Linear Prediction in LVCSR , 2011, INTERSPEECH.

[5]  Milan Sigmund,et al.  Automatic vocal effort detection for reliable speech recognition , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[6]  H. Traunmüller,et al.  Acoustic effects of variation in vocal effort by men, women, and children. , 2000, The Journal of the Acoustical Society of America.

[7]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[8]  Paavo Alku,et al.  Shout detection in noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[10]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[11]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[12]  J. Liénard,et al.  Effect of vocal effort on spectral properties of vowels. , 1999, The Journal of the Acoustical Society of America.

[13]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[14]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Levent M. Arslan,et al.  Automatic Detection of Anger in Human-Human Call Center Dialogs , 2011, INTERSPEECH.

[17]  Samy Bengio,et al.  A statistical significance test for person authentication , 2004, Odyssey.

[18]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[19]  Corinna Harwardt Comparing the Impact of Raised Vocal Effort on Various Spectral Parameters , 2011, INTERSPEECH.