论文信息 - 3-Way Composition of Weighted Finite-State Transducers

3-Way Composition of Weighted Finite-State Transducers

Composition of weighted transducers is a fundamental algorithm used in many applications, including for computing complex edit-distances between automata, or string kernels in machine learning, or to combine different components of a speech recognition, speech synthesis, or information extraction system. We present a generalization of the composition of weighted transducers, 3-way composition, which is dramatically faster in practice than the standard composition algorithm when combining more than two transducers. The worst-case complexity of our algorithm for composing three transducers T 1 , T 2 , and T 3 resulting in T, is O(|T| Q min (d(T 1 ) d(T 3 ), d(T 2 )) + |T| E ), where |·| Q denotes the number of states, |·| E the number of transitions, and d(·) the maximum out-degree. As in regular composition, the use of perfect hashing requires a pre-processing step with linear-time expected complexity in the size of the input transducers. In many cases, this approach significantly improves on the complexity of standard composition. Our algorithm also leads to a dramatically faster composition in practice. Furthermore, standard composition can be obtained as a special case of our algorithm. We report the results of several experiments demonstrating this improvement. These theoretical and empirical improvements significantly enhance performance in the applications already mentioned.

Cyril Allauzen | Mehryar Mohri

[1] Arto Salomaa,et al. Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[2] Emmanuel Roche,et al. Finite-State Language Processing , 1997 .

[3] Arto Salomaa,et al. Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[4] Jean Berstel,et al. Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[5] Dan Jurafsky,et al. Statistical Natural Language Processing , 2010, Encyclopedia of Machine Learning.

[6] Jarkko Kari,et al. Digital Images and Formal Languages , 1997, Handbook of Formal Languages.

[7] Samuel Eilenberg,et al. Automata, languages, and machines. A , 1974, Pure and applied mathematics.

[8] Mehryar Mohri. Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[9] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.

[10] Thomas Sudkamp,et al. Languages and Machines , 1988 .

[11] Arto Salomaa,et al. Semirings, Automata and Languages , 1985 .

[12] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[13] Mehryar Mohri,et al. Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..

[14] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[15] Dominique Perrin. Combinatorics on words , 1981 .

[16] Yves Schabes,et al. Speech Recognition by Composition of Weighted Finite Automata , 1997 .

[17] Fernando Pereira,et al. Weighted Automata in Text and Speech Processing , 2005, ArXiv.