Speaker adaptation using combined transformation and Bayesian methods

Adapting the parameters of a statistical speaker independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the large-vocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers.

[1]  Yves Grenier,et al.  Spectral transformations through canonical correlation analysis for speaker adptation in ASR , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mitch Weintraub,et al.  Robust speech recognition in noise using adaptation and mapping techniques , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  K.F. Lee,et al.  On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition , 1993, IEEE Trans. Speech Audio Process..

[4]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[5]  Mitch Weintraub,et al.  Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[7]  Michael Picheny,et al.  Robust speaker adaptation using a piecewise linear acoustic mapping , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[9]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[10]  Satoshi Nakamura,et al.  A comparative study of spectral mapping for speaker adaptation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[13]  R. Schwartz,et al.  Rapid speaker adaptation using a probabilistic spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  A. Sankar,et al.  Stochastic matching for robust speech recognition , 1994, IEEE Signal Processing Letters.

[15]  Mitch Weintraub,et al.  The Hub and Spoke Paradigm for CSR Evaluation , 1994, HLT.

[16]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[17]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.