On vocabulary-independent speech modeling

The use of vocabulary-independent (VI) models to improve the usability of speech recognizers is described. Initial results using generalized triphones as VI models show that with more training data and more detailed modeling, the error rate of VI models can be reduced substantially. For example, the error rates for VI models with 5000, 10000, and 15000 training sentences, are 23.9%, 15.2%, and 13.3%, respectively. Moreover, if task-specific training data are available, one can interpolate them with VI models. This task adaptation can reduce the error rate by 18% over task-specifying models.<<ETX>>

[1]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[2]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mei-Yuh Hwang,et al.  The SPHINX speech recognition system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[5]  Hsiao-Wuen Hon,et al.  Allophone clustering for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[7]  Kai-Fu Lee,et al.  Corrective and reinforcement learning for speaker-independent continuous speech recognition , 1989, EUROSPEECH.

[8]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[9]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.