A 10000-word continuous-speech recognition system

Some results obtained when the recognition vocabulary size of a phoneme-based speaker-dependent continuous-speech recognizer was increased from 1000 to 10000 words are reported. The potential search space increased from 46000 to 516000 states without problems for the data-driven search. Increasing the recognition vocabulary by a factor of 10 (from a perplexity of 917 to 9686) increased the word error rate by a factor of two (from 21.8% to 43.1%). Phoneme models were tested with both discrete probabilities and continuous mixture densities. The mixture density models performed better; moreover, they saved about half of the search costs. A language model was found to be very important for a larger vocabulary size. With a test set perplexity of 388 (i.e. a reduction by a factor of 25 compared to the case without a bigram model) the error rate decreased by a factor of 2.4. In order to check how meaningful perplexity is for the prediction of the system's performance, a stochastic language model was constructed with a perplexity of 1000, the size of the vocabulary used in previous experiments, and about the same error rate was obtained.<<ETX>>

[1]  Hermann Ney,et al.  Training of phoneme models in a sentence recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hermann Ney,et al.  Phoneme-based continuous speech recognition results for different language models in the 1000-word spicos system , 1988, Speech Commun..

[3]  Andreas Noll,et al.  A data-driven organization of the dynamic programming beam search for continuous speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[5]  Hermann Ney,et al.  Phoneme modelling using continuous mixture densities , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hermann Ney,et al.  Continuous-speech recognition using a stochastic language model , 1989, International Conference on Acoustics, Speech, and Signal Processing,.