A Canonical Form for Weighted Automata and Applications to Approximate Minimization

We study the problem of constructing approximations to a weighted automaton. Weighted finite automata (WFA) are closely related to the theory of rational series. A rational series is a function from strings to real numbers that can be computed by a WFA. Among others, this includes probability distributions generated by hidden Markov models and probabilistic automata. The relationship between rational series and WFA is analogous to the relationship between regular languages and ordinary automata. Associated with such rational series are infinite matrices called Hankel matrices which play a fundamental role in the theory of minimal WFA. Our contributions are: (1) an effective procedure for computing the singular value decomposition (SVD) of such infinite Hankel matrices based on their finite representation in terms of WFA, (2) a new canonical form for WFA based on this SVD decomposition, and, (3) an algorithm to construct approximate minimizations of a given WFA. The goal of our approximate minimization algorithm is to start from a minimal WFA and produce a smaller WFA that is close to the given one in a certain sense. The desired size of the approximating automaton is given as input. We give bounds describing how well the approximation emulates the behavior of the original WFA. The study of this problem is motivated by the analysis of machine learning algorithms that synthesize weighted automata from spectral decompositions of finite Hankel matrices. It is known that when the number of states of the target automaton is correctly guessed, these algorithms enjoy consistency and finite-sample guarantees in the probably approximately correct (PAC) learning model. It has also been suggested that asking the learning algorithm to produce a model smaller than the true one will still yield useful models with reduced complexity. Our results in this paper vindicate these ideas and confirm intuitions provided by empirical studies. Beyond learning problems, our techniques can also be used to reduce the complexity of any algorithm working with WFA, at the expense of incurring a small, controlled amount of error.

[1]  J. Brzozowski Canonical regular expressions and minimal state graphs for definite events , 1962 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Kehe Zhu Operator theory in function spaces , 1990 .

[6]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[7]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[8]  Pierre Dupont,et al.  Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms , 2005, Pattern Recognit..

[9]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[10]  Kevin Knight,et al.  Applications of Weighted Automata in Natural Language Processing , 2009 .

[11]  Michele Boreale,et al.  Weighted Bisimulation in Linear Algebraic Form , 2009, CONCUR.

[12]  François Denis,et al.  Absolute Convergence of Rational Series Is Semi-decidable , 2009, LATA.

[13]  Liva Ralaivola,et al.  Grammatical inference as a principal component analysis problem , 2009, ICML '09.

[14]  Christel Baier,et al.  Model Checking Linear-Time Properties of Probabilistic Systems , 2009 .

[15]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars , 2010, CL.

[16]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[17]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[18]  C. Reutenauer,et al.  Noncommutative Rational Series with Applications , 2010 .

[19]  Raphaël Bailly Méthodes spectrales pour l'inférence grammaticale probabiliste de langages stochastiques rationnels , 2011 .

[20]  Ariadna Quattoni,et al.  A Spectral Learning Algorithm for Finite State Transducers , 2011, ECML/PKDD.

[21]  Raphaël Bailly Quadratic Weighted Automata: Spectral Algorithm and Likelihood Maximization , 2011, ACML 2011.

[22]  John W. Woods,et al.  Digital Image Compression , 2012 .

[23]  Mehryar Mohri,et al.  Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[24]  Alexandra Silva,et al.  A coalgebraic perspective on linear weighted automata , 2011, Inf. Comput..

[25]  Shay B. Cohen,et al.  Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs , 2012, NIPS.

[26]  Prakash Panangaden,et al.  Minimization via Duality , 2012, WoLLIC.

[27]  Giorgio Satta,et al.  Approximate PCFG Parsing Using Tensor Decomposition , 2013, NAACL.

[28]  Ariadna Quattoni,et al.  Unsupervised Spectral Learning of Finite State Transducers , 2013, NIPS.

[29]  Ariadna Quattoni,et al.  Spectral Learning of Sequence Taggers over Continuous Sequences , 2013, ECML/PKDD.

[30]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[31]  Alex Kulesza,et al.  Low-Rank Spectral Learning , 2014, AISTATS.

[32]  S. Lototsky Simple spectral bounds for sums of certain Kronecker products , 2014, 1404.4361.

[33]  Alexandra Silva,et al.  Algebra-coalgebra duality in brzozowski's minimization algorithm , 2014, ACM Trans. Comput. Log..

[34]  Joelle Pineau,et al.  Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison , 2014, ICML.

[35]  Stefan Kiefer,et al.  Stability and Complexity of Minimising Probabilistic Automata , 2014, ICALP.

[36]  Nan Jiang,et al.  Low-Rank Spectral Learning with Weighted Loss Functions , 2015, AISTATS.

[37]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .