论文信息 - A weight pushing algorithm for large vocabulary speech recognition

A weight pushing algorithm for large vocabulary speech recognition

Weighted finite-state transducers provide a general framew ork for the representation of the components of speech recognit i n systems; language models, pronunciation dictionaries, co ntextdependent models, HMM-level acoustic models, and the output word or phone lattices can all be represented by weighted automata and transducers. In general, a representation is n ot unique and there may be different weighted transducers real izing the same mapping. In particular, even when they have exactly the same topology with the same input and output labels , two equivalent transducers may differ by the way the weights are distributed along each path. We present aweight pushingalgorithm that modifies the weights of a given weighted transducer in a way such that the transition probabilities form a stochastic distribution. This results in an equivalent transducer whose weight distributio n is more suitable for pruning and speech recognition. We demonstrate substantial improvements of the speed of our recogni ti n system in several tasks based on the use of this algorithm. We report a45% speedup at83% word accuracy with a simple single-pass40; 000-word vocabulary North American Business News (NAB) recognition system on the DARPA Eval ’95 test set. With the same technique, we report a 550% speedup at 88% word accuracy in rescoring NAB word lattices with more accurate 2nd-pass models. We finally report a 280% speedup at 68% word accuracy for100; 000 first name-last name pairs recognition.

Mehryar Mohri | Michael Riley

[1] Arto Salomaa,et al. Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[2] MohriMehryar,et al. Weighted finite-state transducers in speech recognition , 2002 .

[3] Mehryar Mohri,et al. The Design Principles of a Weighted Finite-State Transducer Library , 2000, Theor. Comput. Sci..

[4] Jean Berstel,et al. Rational series and their languages , 1988, EATCS monographs on theoretical computer science.

[5] Mehryar Mohri,et al. Integrated context-dependent networks in very large vocabulary speech recognition , 1999, EUROSPEECH.

[6] Fernando Pereira,et al. Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[7] Shrikanth S. Narayanan,et al. VPQ: a spoken language interface to large scale directory information , 1998, ICSLP.

[8] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.

[9] Mehryar Mohri,et al. Network optimizations for large-vocabulary speech recognition , 1999, Speech Commun..

[10] Arto Salomaa,et al. Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[11] Mehryar Mohri,et al. Minimization algorithms for sequential transducers , 2000, Theor. Comput. Sci..

[12] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[13] Andrej Ljolje,et al. Full expansion of context-dependent networks in large vocabulary speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .

[15] Jack W. Carlyle,et al. Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..