Approximating Probabilistic Models as Weighted Finite Automata

Weighted finite automata (WFA) are often used to represent probabilistic models, such as $n$-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leiber divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling $n$-gram models from neural models, building compact language models, and building open-vocabulary character models.

[1]  Jason Eisner Expectation Semirings: Flexible EM for Learning Finite-State Transducers , 2001 .

[2]  William J. Byrne,et al.  Hierarchical Phrase-based Translation Representations , 2011, EMNLP.

[3]  Richard Sproat,et al.  The Kestrel TTS text normalization system , 2014, Natural Language Engineering.

[4]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[5]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[6]  Matthew Young-Lai Grammar Inference , 2009, Encyclopedia of Database Systems.

[7]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[10]  Eran Yahav,et al.  Learning Deterministic Weighted Automata with Queries and Counterexamples , 2019, NeurIPS.

[11]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[12]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.

[13]  JacobssonHenrik Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review , 2005 .

[14]  Mehryar Mohri,et al.  String-Matching with Automata , 1997, Nord. J. Comput..

[15]  Leonard Pitt,et al.  Inductive Inference, DFAs, and Computational Complexity , 1989, AII.

[16]  Rafael C. Carrasco Accurate Computation of the Relative Entropy Between Stochastic Regular Grammars , 1997, RAIRO Theor. Informatics Appl..

[17]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[18]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[19]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[20]  Mehryar Mohri,et al.  On the Computation of the Relative Entropy of Probabilistic Automata , 2008, Int. J. Found. Comput. Sci..

[21]  Thorsten Brants,et al.  Study on interaction between entropy pruning and kneser-ney smoothing , 2010, INTERSPEECH.

[22]  Sanjay Jain,et al.  Inductive Inference , 2010, Encyclopedia of Machine Learning.

[23]  Brian Roark,et al.  The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.

[24]  Taro Sekiyama,et al.  Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces , 2019, AAAI.

[25]  Pedro García,et al.  IDENTIFYING REGULAR LANGUAGES IN POLYNOMIAL TIME , 1993 .

[26]  Ngoc Thang Vu,et al.  Comparing approaches to convert recurrent neural networks into backoff language models for efficient decoding , 2014, INTERSPEECH.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Mehryar Mohri,et al.  Weighted Automata Algorithms , 2009 .

[29]  Brian Roark,et al.  Latin script keyboards for South Asian languages with finite-state normalization , 2019, FSMNLP.

[30]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[31]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[32]  Tom Ouyang,et al.  Mobile Keyboard Input Decoding with Finite-State Transducers , 2017, ArXiv.

[33]  Brian Roark,et al.  Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers , 2017, FSMNLP.

[34]  Mehryar Mohri,et al.  Minimization algorithms for sequential transducers , 2000, Theor. Comput. Sci..

[35]  Keikichi Hirose,et al.  Failure transitions for joint n-gram models and G2p conversion , 2013, INTERSPEECH.

[36]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[37]  Mehryar Mohri,et al.  Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..

[38]  Ebru Arisoy,et al.  Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition , 2013, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  Ebru Arisoy,et al.  Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  John W. Woods,et al.  Digital Image Compression , 2012 .

[41]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[42]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[43]  Cyril Allauzen,et al.  Federated Learning of N-Gram Language Models , 2019, CoNLL.

[44]  Sanjeev Khudanpur,et al.  Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  R. Horst,et al.  DC Programming: Overview , 1999 .

[46]  Cyril Allauzen,et al.  Algorithms for Weighted Finite Automata with Failure Transitions , 2018, CIAA.

[47]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[48]  Mehryar Mohri,et al.  Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[49]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[50]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[51]  José Oncina,et al.  Learning deterministic regular grammars from stochastic samples in polynomial time , 1999, RAIRO Theor. Informatics Appl..

[52]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[53]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[54]  AngluinDana Learning regular sets from queries and counterexamples , 1987 .

[55]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[56]  Petr Motlícek,et al.  Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition , 2012, INTERSPEECH.

[57]  Brian Roark,et al.  Distilling weighted finite automata from arbitrary probabilistic models , 2019, FSMNLP.

[58]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[59]  Pierre Dupont,et al.  Incremental regular inference , 1996, ICGI.

[60]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[61]  Stefan C. Kremer,et al.  Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results , 2003, J. Mach. Learn. Res..

[62]  Rajesh Parekh,et al.  Grammar Inference Automata Induction and Language Acquisition , 2005 .

[63]  Peter Tiño,et al.  Extracting stochastic machines from recurrent neural networks trained on complex symbolic sequences , 1997, Proceedings of 1st International Conference on Conventional and Knowledge Based Intelligent Electronic Systems. KES '97.

[64]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .