A weight pushing algorithm for large vocabulary speech recognition

Weighted finite-state transducers provide a general framew ork for the representation of the components of speech recognit i n systems; language models, pronunciation dictionaries, co ntextdependent models, HMM-level acoustic models, and the output word or phone lattices can all be represented by weighted automata and transducers. In general, a representation is n ot unique and there may be different weighted transducers real izing the same mapping. In particular, even when they have exactly the same topology with the same input and output labels , two equivalent transducers may differ by the way the weights are distributed along each path. We present aweight pushingalgorithm that modifies the weights of a given weighted transducer in a way such that the transition probabilities form a stochastic distribution. This results in an equivalent transducer whose weight distributio n is more suitable for pruning and speech recognition. We demonstrate substantial improvements of the speed of our recogni ti n system in several tasks based on the use of this algorithm. We report a45% speedup at83% word accuracy with a simple single-pass40; 000-word vocabulary North American Business News (NAB) recognition system on the DARPA Eval ’95 test set. With the same technique, we report a 550% speedup at 88% word accuracy in rescoring NAB word lattices with more accurate 2nd-pass models. We finally report a 280% speedup at 68% word accuracy for100; 000 first name-last name pairs recognition.

[1]  Arto Salomaa,et al.  Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[2]  MohriMehryar,et al.  Weighted finite-state transducers in speech recognition , 2002 .

[3]  Mehryar Mohri,et al.  The Design Principles of a Weighted Finite-State Transducer Library , 2000, Theor. Comput. Sci..

[4]  Jean Berstel,et al.  Rational series and their languages , 1988, EATCS monographs on theoretical computer science.

[5]  Mehryar Mohri,et al.  Integrated context-dependent networks in very large vocabulary speech recognition , 1999, EUROSPEECH.

[6]  Fernando Pereira,et al.  Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[7]  Shrikanth S. Narayanan,et al.  VPQ: a spoken language interface to large scale directory information , 1998, ICSLP.

[8]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[9]  Mehryar Mohri,et al.  Network optimizations for large-vocabulary speech recognition , 1999, Speech Commun..

[10]  Arto Salomaa,et al.  Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[11]  Mehryar Mohri,et al.  Minimization algorithms for sequential transducers , 2000, Theor. Comput. Sci..

[12]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[13]  Andrej Ljolje,et al.  Full expansion of context-dependent networks in large vocabulary speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[15]  Jack W. Carlyle,et al.  Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..