论文信息 - Enhancement and recognition of whispered speech

Enhancement and recognition of whispered speech

The goal of this thesis is to study whispering from a signal processing perspective. Although it is a common mode of speech, there is little research on the subject, and even less in the traditional processing fields. To fill this gap, this thesis focuses on three areas: whisper to voice speech conversion, noise mitigation for speech coding, and speech recognition. In the speech conversion task, the relationships between normal and whispered speech are cast as a statistical estimation problem. To model whispered speech, new statistical models of speech based on jump Markov linear systems (JMLS) are developed to determine interframe relationships in the mixed excitation linear prediction (MELP) model. In addition, new methods for modifying linear prediction spectra are developed and used to explore the acoustic differences between phonated and whispered speech. These algorithms are combined to create estimates of the MELP parameters of normal speech, which are in turn synthesized using with a MELP decoder to create normal speech. In order to remove noise from whispered speech, several algorithms for estimating spectral parameters from noisy environments are proposed. These schemes are based on direct estimation of the LPC parameters instead of the spectrum. To improve these estimates, the JMLS based spectral models described above are applied to develop spectral smoothers. These methods are shown to achieve better performance than tradition speech enhancers on whispered speech when the noise is nearly stationary. Finally, speech and speaker recognition tasks are conducted on whispered speech data. When speech recognition systems trained on normal speech were applied to whispered speech, the performance was found to drop significantly due to the model mismatch. Methods that modified the spectrum were found to help slightly, but the greatest performance gain was made with model adaptation methods. Overall, most speech processing algorithms for normal speech are applicable to whispers. However, there is usually a performance drop from normal speech to whispered speech that can be improved by using methods that are designed for whispered speech. In addition, many of the methods developed for whispered speech are applicable to normal speech.

Mark A. Clements | Robert W. Morris