EXPANSION OF CONTEXT-DEPENDENT NETWORKS IN LARGE VOCABULARY SPEECH RECOGNITION

We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition combining an -gram language model, a pronunciation dictionary and context-dependency modeling. While fullyexpanded networks have been used before in restrictive settings (medium vocabulary or no cross-word contexts), we demonstrate that our network determinization method makes it practical to use fully-expanded networks also in large-vocabulary recognition with full cross-word context modeling. For the DARPA North American Business News task (NAB), we give network sizes and recognition speeds and accuracies using bigram and trigram grammars with vocabulary sizes ranging from 10,000 to 160,000 words. With our construction, the fully-expanded NAB context-dependent networks contain only about twice as many arcs as the corresponding language models. Interestingly, we also find that, with these networks,real-timeword accuracy is improved by increasing vocabulary size andn-gram order.

[1]  Fernando Pereira,et al.  Transducer composition for context-dependent network expansion , 1997, EUROSPEECH.

[2]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[3]  Hermann Ney,et al.  A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Ronald Rosenfeld,et al.  Scalable backoff language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Hermann Ney,et al.  Language-model look-ahead for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Michael Riley,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1996, ArXiv.

[7]  Mehryar Mohri,et al.  On some applications of finite-state automata theory to natural language processing , 1996, Nat. Lang. Eng..

[8]  Roberto Pieraccini,et al.  Non-deterministic stochastic language models for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[10]  Steve J. Young,et al.  Tree-Based State Tying for High Accuracy Modelling , 1994, HLT.

[11]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.