论文信息 - On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition

On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition

The DARPA Resource Management task is used as a domain for investigating the performance of speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. The error rate of the speaker-independent recognition system, SPHINX, was reduced substantially by incorporating between-word triphone models additional dynamic features, and sex-dependent, semicontinuous hidden Markov models. The error rate for speaker-independent recognition was 4.3%. On speaker-dependent data, the error rate was further reduced to 2.6-1.4% with 600-2400 training sentences for each speaker. Using speaker-independent models, the authors studied speaker-adaptive recognition. Both codebooks and output distributions were considered for adaptation. It was found that speaker-adaptive systems outperform both speaker-independent and speaker-dependent systems, suggesting that the most effective system is one that begins with speaker-independent training and continues to adapt to users. >

K.F. Lee | X. Huang

[1] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .

[2] B. Merialdo,et al. Phoneme classification using Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Marco Ferretti,et al. Large-vocabulary speech recognition with speaker-adapted codebook and HMM parameters , 1989, EUROSPEECH.

[4] Jonathan G. Fiscus,et al. DARPA Resource Management Benchmark Test Results June 1990 , 1990, HLT.

[5] Richard M. Stern,et al. Dynamic speaker adaptation for isolated letter recognition using MAP estimation , 1983, ICASSP.

[6] Frederick Jelinek,et al. The development of an experimental discrete dictation recognizer , 1985 .

[7] Kiyohiro Shikano,et al. Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Richard M. Schwartz,et al. Improved speaker adaption using text dependent spectral mappings , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9] Kai-Fu Lee,et al. Automatic Speech Recognition , 1989 .

[10] Xuedong Huang,et al. A Study on Speaker-Adaptive Speech Recognition , 1991, HLT.

[11] Satoshi Nakamura,et al. Speaker adaptation applied to HMM and neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12] Mei-Yuh Hwang,et al. Improved acoustic modeling with the SPHINX speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[13] Biing-Hwang Juang,et al. A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] Chin-Hui Lee,et al. Bayesian adaptation in speech recognition , 1983, ICASSP.

[15] Xuedong Huang,et al. Semi-continuous hidden Markov models for speech signals , 1990 .

[16] S. Roucos,et al. The role of word-dependent coarticulatory effects in a phoneme-based speech recognition system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] R. Schwartz,et al. Rapid speaker adaptation using a probabilistic spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[19] Richard M. Schwartz,et al. A New Paradigm for Speaker-Independent Training and Speaker Adaptation , 1990, HLT.

[20] Mei-Yuh Hwang,et al. Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition , 1990, HLT.

[21] Mei-Yuh Hwang,et al. Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[22] Masafumi Nishimura,et al. Speaker adaptation method for HMM-based speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[23] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[24] D. B. Paul,et al. The Lincoln robust continuous speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[25] Aaron E. Rosenberg,et al. Improved Acoustic Modeling for Continuous Speech Recognition , 1990, HLT.

[26] Hsiao-Wuen Hon,et al. An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..