Towards Speech Recognition Without Vocabulary-Specific Training

With the emergence of high-performance speaker-independent systems, a great barrier to man-machine interface has been overcome. This work describes our next step to improve the usability of speech recognizers—the use of vocabulary-independent (VI) models. If successful, VI models are trained once and for all. They will completely eliminate task-specific training, and will enable rapid configuration of speech recognizers for new vocabularies. Our initial results using generalized triphones as VI models show that with more training data and more detailed modeling, the error rate of VI models can be reduced substantially. For example, the error rates for VI models with 5,000, 10,000 and 15,000 training sentences are 23.9%, 15.2% and 13.3% respectively. Moreover, if task-specific training data were available, we can interpolate them with VI models. Our prelimenary results show that this interpolation can lead to an 18% error rate reduction over task-specific models.

[1]  Mei-Yuh Hwang,et al.  The SPHINX speech recognition system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  Raj Reddy,et al.  Automatic Speech Recognition: The Development of the Sphinx Recognition System , 1988 .

[4]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[5]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[6]  Kai-Fu Lee Hidden Markov models: past, present, and future , 1989, EUROSPEECH.

[7]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Kai-Fu Lee,et al.  Corrective and reinforcement learning for speaker-independent continuous speech recognition , 1989, EUROSPEECH.

[10]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .