HTK-Based Recognition of Whispered Speech

This paper presents results on whispered speech recognition of isolated words with Whi-Spe database, in speaker dependent mode. Word recognition rate is calculated for all speakers, four train/test scenarios, three values of mixture components, with modeling of context independent monophones, context dependent triphones and whole words. As a feature vector, Mel Frequency Cepstral Coefficients was used. The HTK, toolkit for building Hidden Markov Models, was used to implement isolated word recognizer. The best obtained results in match scenarios showed nearly equal recognition rate of 99.86% in normal speech recognition, and 99.90% in whispered speech recognition. Specifically, in mismatch scenarios, the best achieved recognition rate was 64.80% for training on part of normally phonated speech and testing on whispered speech and, in the opposite case, with training on whispered speech, the normal speech recognition was 74.88%.