Robust Recognition of Spontaneous Speech

This contribution describes the challenges and the progress which have been made in Verbmobil concerning robustness of speech recognition for various types of adverse conditions, like channel distortion, environmental noise and various speaker and speaking conditions. For the channel and noise problem classical approaches like cepstral bias normalization and spectral subtraction methods have been improved as well as new methods like parallel model combination. One major result is the fact, that an intelligent combination of various methods achieves the best results. Considerable progresses have also been made in research on unsupervised speaker adaptation. Several different main approaches are presented to improve robustness against variations of speaking rate, speaking style and speaker characteristics. The methods described include new estimation of the parameters for vocal tract length normalization, features and codebook transformation methods using ML algorithms, and pronunciation adaptation of the words in the lexicon.

[1]  Volker Schless,et al.  Adaptive model combination for robust speech recognition in car environments , 1997, EUROSPEECH.

[2]  Philip C. Woodland,et al.  Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[3]  Thilo Pfau,et al.  Creating hidden Markov models for fast speech , 1998, ICSLP.

[4]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Thilo Pfau,et al.  Speaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech , 1999, EUROSPEECH.

[6]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[7]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[8]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[9]  Thilo Pfau,et al.  Estimating the speaking rate by vowel detection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  I. N. Mann,et al.  Proceedings of the 6th European Conference on Speech Communication and Technology , 1999 .

[11]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[12]  Peter Regel-Brietzmann,et al.  Optimization of an HMM - based continuous speech recognizer , 1993, EUROSPEECH.