Analysis and recognition of whispered speech

Abstract In this study, we have examined the acoustic characteristics of whispered speech and addressed some of the issues involved in recognition of whispered speech used for communication over a mobile phone in a noisy environment. The acoustic analysis shows that there is an upward shift of formant frequencies of vowels as observed in the whispered speech data compared to the normal speech data. Voiced consonants in the whispered speech have lower energy at low frequencies up to 1.5 kHz and their spectral flatness is greater compared to the normal speech. In experiments on whispered speech recognition, results of our studies on adaptation of the whispered speech models have shown that adaptation using a small amount of whispered speech data from a target speaker can be effectively used for recognition of the whispered speech. In a noisy environment, the recognition accuracy decreases significantly for the whispered speech compared to the normal speaking of the same speech. A method to increase the SNR by covering the mouth with a hand has been shown to give a higher recognition accuracy for the whispered speech frequently encountered for private communication in a noisy environment.

[1]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[2]  O. Fujimura,et al.  Sweep-tone measurements of vocal-tract characteristics. , 1971, The Journal of the Acoustical Society of America.

[3]  K. Kallail,et al.  Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects. , 1984, Journal of speech and hearing research.

[4]  I. Thomas Perceived pitch of whispered vowels. , 1969, The Journal of the Acoustical Society of America.

[5]  J. N. Holmes,et al.  Acoustic correlates of intonation in whispered speech , 1983 .

[6]  Stanley J. Wenndt,et al.  A study on the classification of whispered and normally phonated speech , 2002, INTERSPEECH.

[7]  Nobuaki Minematsu,et al.  Japanese dictation toolkit: plug-and-play framework for speech recognition R&D , 1999 .

[8]  H. Traunmüller,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Comparative Study of the Male and Female Whispered and Phonated Versions of the Long Vowels of Swedish , 2022 .

[9]  Mark A. Clements,et al.  Estimation of speech spectra from whispers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  P. Woodland,et al.  Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[11]  Shuichi Itahashi,et al.  Design and Development of Japanese Speech Corpus for Large Vocabulary Continuous Speech Recognition Assessment , 1998 .