The Recognition of Bimodal Produced Speech based on Multi-style Training

In this paper an analysis on the recognition of words from speech database Whi-Spe in normal and whispered phonation, based on the conventional HMM/GMM framework, is presented. The analysis based on multi-style training is performed in the speaker dependent (SD) and speaker independent mode (SI). The analysis showed that a small portion of whisper data in training (10%) is required for the recognition of whisper higher than 90%, for both the SD and SI recognition.

[1]  Dorde T. Grozdic,et al.  Whispered Speech Database: Design, Processing and Application , 2013, TSD.

[2]  Luca Romeo,et al.  Convolutional Recurrent Neural Networks and Acoustic Data Augmentation for Snore Detection , 2019, Neural Approaches to Dynamics of Signal Exchanges.

[3]  V C Tartter,et al.  Identifiability of vowels and speakers from whispered syllables , 1991, Perception & psychophysics.

[4]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[5]  Amit M. Joshi,et al.  Speaker Identification Through Natural and Whisper Speech Signal , 2019, Lecture Notes in Electrical Engineering.

[6]  Slobodan Jovicic,et al.  HTK-Based Recognition of Whispered Speech , 2014, SPECOM.

[7]  Kazuya Takeda,et al.  Analysis and recognition of whispered speech , 2005, Speech Commun..

[8]  Vlado Delic,et al.  HMM-based Whisper Recognition using μ-law Frequency Warping , 2018 .

[9]  Misko Subotic,et al.  Whispered speech recognition using deep denoising autoencoder , 2017, Eng. Appl. Artif. Intell..

[10]  John H. L. Hansen,et al.  UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  M. V. Achuth Rao,et al.  Formant-gaps Features for Speaker Verification Using Whispered Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).