On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition

The DARPA Resource Management task is used as the domain to investigate the performance of speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. The authors already have a state-of-the-art speaker-independent speech recognition system, SPHINX. The error rate for RM2 test set is 4.3%. They extended SPHINX to speaker-dependent speech recognition. The error rate is reduced to 1.4-2.6% with 600-2400 training sentences for each speaker, which demonstrated a substantial difference between speaker-dependent and -independent systems. Based on speaker-independent models, a study was made of speaker-adaptive speech recognition. With 40 adaptation sentences for each speaker, the error rate can be reduced from 4.3% to 3.1%.<<ETX>>

[1]  D. B. Paul,et al.  Speaker stress-resistant continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  Masafumi Nishimura,et al.  Speaker adaptation method for HMM-based speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jonathan G. Fiscus,et al.  DARPA Resource Management Benchmark Test Results June 1990 , 1990, HLT.

[4]  Marco Ferretti,et al.  Large-vocabulary speech recognition with speaker-adapted codebook and HMM parameters , 1989, EUROSPEECH.

[5]  D. B. Paul,et al.  The Lincoln robust continuous speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Aaron E. Rosenberg,et al.  Improved Acoustic Modeling for Continuous Speech Recognition , 1990, HLT.

[7]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[8]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[9]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[10]  Satoshi Nakamura,et al.  Speaker adaptation applied to HMM and neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[11]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[12]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[13]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[14]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Kiyohiro Shikano,et al.  Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Richard M. Schwartz,et al.  Improved speaker adaption using text dependent spectral mappings , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[18]  Mei-Yuh Hwang,et al.  Improved acoustic modeling with the SPHINX speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[20]  Xuedong Huang,et al.  On semi-continuous hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[22]  S. Roucos,et al.  The role of word-dependent coarticulatory effects in a phoneme-based speech recognition system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  R. Schwartz,et al.  Rapid speaker adaptation using a probabilistic spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Richard M. Stern,et al.  Dynamic speaker adaptation for isolated letter recognition using MAP estimation , 1983, ICASSP.

[26]  Chin-Hui Lee,et al.  Bayesian adaptation in speech recognition , 1983, ICASSP.

[27]  Richard M. Schwartz,et al.  A New Paradigm for Speaker-Independent Training and Speaker Adaptation , 1990, HLT.