论文信息 - Approximating Probabilistic Models as Weighted Finite Automata

Approximating Probabilistic Models as Weighted Finite Automata

Weighted finite automata (WFA) are often used to represent probabilistic models, such as $n$-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leiber divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling $n$-gram models from neural models, building compact language models, and building open-vocabulary character models.

[1] Jason Eisner. Expectation Semirings: Flexible EM for Learning Finite-State Transducers , 2001 .

[2] William J. Byrne,et al. Hierarchical Phrase-based Translation Representations , 2011, EMNLP.

[3] Richard Sproat,et al. The Kestrel TTS text normalization system , 2014, Natural Language Engineering.

[4] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[5] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[6] Matthew Young-Lai. Grammar Inference , 2009, Encyclopedia of Database Systems.

[7] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[10] Eran Yahav,et al. Learning Deterministic Weighted Automata with Queries and Counterexamples , 2019, NeurIPS.

[11] José Oncina,et al. Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[12] Thomas M. Breuel,et al. The OCRopus open source OCR system , 2008, Electronic Imaging.

[13] JacobssonHenrik. Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review , 2005 .

[14] Mehryar Mohri,et al. String-Matching with Automata , 1997, Nord. J. Comput..

[15] Leonard Pitt,et al. Inductive Inference, DFAs, and Computational Complexity , 1989, AII.

[16] Rafael C. Carrasco. Accurate Computation of the Relative Entropy Between Stochastic Regular Grammars , 1997, RAIRO Theor. Informatics Appl..

[17] Hubert Eichner,et al. Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[18] Eran Yahav,et al. Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[19] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[20] Mehryar Mohri,et al. On the Computation of the Relative Entropy of Probabilistic Automata , 2008, Int. J. Found. Comput. Sci..

[21] Thorsten Brants,et al. Study on interaction between entropy pruning and kneser-ney smoothing , 2010, INTERSPEECH.

[22] Sanjay Jain,et al. Inductive Inference , 2010, Encyclopedia of Machine Learning.

[23] Brian Roark,et al. The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.

[24] Taro Sekiyama,et al. Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces , 2019, AAAI.

[25] Pedro García,et al. IDENTIFYING REGULAR LANGUAGES IN POLYNOMIAL TIME , 1993 .

[26] Ngoc Thang Vu,et al. Comparing approaches to convert recurrent neural networks into backoff language models for efficient decoding , 2014, INTERSPEECH.

[27] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28] Mehryar Mohri,et al. Weighted Automata Algorithms , 2009 .

[29] Brian Roark,et al. Latin script keyboards for South Asian languages with finite-state normalization , 2019, FSMNLP.

[30] E. Mark Gold,et al. Language Identification in the Limit , 1967, Inf. Control..

[31] E. Mark Gold,et al. Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[32] Tom Ouyang,et al. Mobile Keyboard Input Decoding with Finite-State Transducers , 2017, ArXiv.

[33] Brian Roark,et al. Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers , 2017, FSMNLP.

[34] Mehryar Mohri,et al. Minimization algorithms for sequential transducers , 2000, Theor. Comput. Sci..

[35] Keikichi Hirose,et al. Failure transitions for joint n-gram models and G2p conversion , 2013, INTERSPEECH.

[36] Ariadna Quattoni,et al. Spectral learning of weighted automata , 2014, Machine Learning.

[37] Mehryar Mohri,et al. Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..

[38] Ebru Arisoy,et al. Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition , 2013, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39] Ebru Arisoy,et al. Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40] John W. Woods,et al. Digital Image Compression , 2012 .

[41] Alfred V. Aho,et al. Efficient string matching , 1975, Commun. ACM.

[42] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[43] Cyril Allauzen,et al. Federated Learning of N-Gram Language Models , 2019, CoNLL.

[44] Sanjeev Khudanpur,et al. Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45] R. Horst,et al. DC Programming: Overview , 1999 .

[46] Cyril Allauzen,et al. Algorithms for Weighted Finite Automata with Failure Transitions , 2018, CIAA.

[47] P. Alam. ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[48] Mehryar Mohri,et al. Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[49] Jong Kyoung Kim,et al. Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[50] Dana Angluin,et al. Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[51] José Oncina,et al. Learning deterministic regular grammars from stochastic samples in polynomial time , 1999, RAIRO Theor. Informatics Appl..

[52] Mehryar Mohri,et al. Speech Recognition with Weighted Finite-State Transducers , 2008 .

[53] C. Lee Giles,et al. Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[54] AngluinDana. Learning regular sets from queries and counterexamples , 1987 .

[55] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[56] Petr Motlícek,et al. Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition , 2012, INTERSPEECH.

[57] Brian Roark,et al. Distilling weighted finite automata from arbitrary probabilistic models , 2019, FSMNLP.

[58] Gert R. G. Lanckriet,et al. On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[59] Pierre Dupont,et al. Incremental regular inference , 1996, ICGI.

[60] Henrik Jacobsson,et al. Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[61] Stefan C. Kremer,et al. Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results , 2003, J. Mach. Learn. Res..

[62] Rajesh Parekh,et al. Grammar Inference Automata Induction and Language Acquisition , 2005 .

[63] Peter Tiño,et al. Extracting stochastic machines from recurrent neural networks trained on complex symbolic sequences , 1997, Proceedings of 1st International Conference on Conventional and Knowledge Based Intelligent Electronic Systems. KES '97.

[64] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .