On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition

The DARPA Resource Management task is used as a domain for investigating the performance of speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. The error rate of the speaker-independent recognition system, SPHINX, was reduced substantially by incorporating between-word triphone models additional dynamic features, and sex-dependent, semicontinuous hidden Markov models. The error rate for speaker-independent recognition was 4.3%. On speaker-dependent data, the error rate was further reduced to 2.6-1.4% with 600-2400 training sentences for each speaker. Using speaker-independent models, the authors studied speaker-adaptive recognition. Both codebooks and output distributions were considered for adaptation. It was found that speaker-adaptive systems outperform both speaker-independent and speaker-dependent systems, suggesting that the most effective system is one that begins with speaker-independent training and continues to adapt to users. >

[1]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[2]  B. Merialdo,et al.  Phoneme classification using Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Marco Ferretti,et al.  Large-vocabulary speech recognition with speaker-adapted codebook and HMM parameters , 1989, EUROSPEECH.

[4]  Jonathan G. Fiscus,et al.  DARPA Resource Management Benchmark Test Results June 1990 , 1990, HLT.

[5]  Richard M. Stern,et al.  Dynamic speaker adaptation for isolated letter recognition using MAP estimation , 1983, ICASSP.

[6]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[7]  Kiyohiro Shikano,et al.  Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Richard M. Schwartz,et al.  Improved speaker adaption using text dependent spectral mappings , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[10]  Xuedong Huang,et al.  A Study on Speaker-Adaptive Speech Recognition , 1991, HLT.

[11]  Satoshi Nakamura,et al.  Speaker adaptation applied to HMM and neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Mei-Yuh Hwang,et al.  Improved acoustic modeling with the SPHINX speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Chin-Hui Lee,et al.  Bayesian adaptation in speech recognition , 1983, ICASSP.

[15]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[16]  S. Roucos,et al.  The role of word-dependent coarticulatory effects in a phoneme-based speech recognition system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  R. Schwartz,et al.  Rapid speaker adaptation using a probabilistic spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[19]  Richard M. Schwartz,et al.  A New Paradigm for Speaker-Independent Training and Speaker Adaptation , 1990, HLT.

[20]  Mei-Yuh Hwang,et al.  Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition , 1990, HLT.

[21]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[22]  Masafumi Nishimura,et al.  Speaker adaptation method for HMM-based speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[23]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[24]  D. B. Paul,et al.  The Lincoln robust continuous speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[25]  Aaron E. Rosenberg,et al.  Improved Acoustic Modeling for Continuous Speech Recognition , 1990, HLT.

[26]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..