论文信息 - HTK-Based Recognition of Whispered Speech

HTK-Based Recognition of Whispered Speech

This paper presents results on whispered speech recognition of isolated words with Whi-Spe database, in speaker dependent mode. Word recognition rate is calculated for all speakers, four train/test scenarios, three values of mixture components, with modeling of context independent monophones, context dependent triphones and whole words. As a feature vector, Mel Frequency Cepstral Coefficients was used. The HTK, toolkit for building Hidden Markov Models, was used to implement isolated word recognizer. The best obtained results in match scenarios showed nearly equal recognition rate of 99.86% in normal speech recognition, and 99.90% in whispered speech recognition. Specifically, in mismatch scenarios, the best achieved recognition rate was 64.80% for training on part of normally phonated speech and testing on whispered speech and, in the opposite case, with training on whispered speech, the normal speech recognition was 74.88%.

Slobodan Jovicic | Jovan Galic | Branko Markovic | Dorde Grozdic

[1] John H. L. Hansen,et al. Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[2] John H. L. Hansen,et al. Speaker Identification Within Whispered Speech Audio Streams , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3] S. Jovicic,et al. Acoustic analysis of consonants in whispered speech. , 2008, Journal of voice : official journal of the Voice Foundation.

[4] Dorde T. Grozdic,et al. Whispered Speech Database: Design, Processing and Application , 2013, TSD.

[5] Juraj Kacur,et al. Practical Issues of Building Robust HMM Models Using HTK and SPHINX Systems , 2008 .

[6] Kazuya Takeda,et al. Analysis and recognition of whispered speech , 2005, Speech Commun..

[7] S. Sovilj-Nikic,et al. Tree-based Phone Duration Modelling of the Serbian Language , 2014 .

[8] J. Wolfe,et al. The effect of whisper and creak vocal mechanisms on vocal tract resonances. , 2010, The Journal of the Acoustical Society of America.

[9] Wendy J. Holmes,et al. Speech Synthesis and Recognition , 1988 .

[10] D. T. Grozdic,et al. Application of neural networks in whispered speech recognition , 2012, 2012 20th Telecommunications Forum (TELFOR).