Learning Probability Distributions Generated by Finite-State Machines

We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a high-level algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.

[1]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[3]  Joelle Pineau,et al.  PAC-Learning of Markov Models with Hidden State , 2006, ECML.

[4]  Marcel Paul Schützenberger,et al.  On the Definition of a Family of Automata , 1961, Inf. Control..

[5]  Paul W. Goldberg,et al.  PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance , 2007, Theor. Comput. Sci..

[6]  Colin de la Higuera,et al.  Learning Stochastic Finite Automata , 2004, ICGI.

[7]  Ariadna Quattoni,et al.  Spectral Learning for Non-Deterministic Dependency Parsing , 2012, EACL.

[8]  Amaury Habrard,et al.  Learning Rational Stochastic Languages , 2006, COLT.

[9]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[10]  Ricard Gavaldà,et al.  A Lower Bound for Learning Distributions Generated by Probabilistic Automata , 2010, ALT.

[11]  Sebastiaan A. Terwijn,et al.  On the Learnability of Hidden Markov Models , 2002, ICGI.

[12]  Colin de la Higuera,et al.  Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[13]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[14]  Walter Daelemans Colin de la Higuera: Grammatical inference: learning automata and grammars , 2011, Machine Translation.

[15]  Balle Pigem,et al.  Learning finite-state machines: statistical and algorithmic aspects , 2013 .

[16]  Ariadna Quattoni,et al.  Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[17]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part II , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  José Oncina,et al.  Learning deterministic regular grammars from stochastic samples in polynomial time , 1999, RAIRO Theor. Informatics Appl..

[19]  S. V. N. Vishwanathan,et al.  Learnability of Probabilistic Automata via Oracles , 2005, ALT.

[20]  Pierre Dupont,et al.  Stochastic Grammatical Inference with Multinomial Tests , 2002, ICGI.

[21]  Ricard Gavaldà,et al.  Learning PDFA with Asynchronous Transitions , 2010, ICGI.

[22]  Mehryar Mohri,et al.  Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[23]  G. Strang Introduction to Linear Algebra , 1993 .

[24]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[25]  Liva Ralaivola,et al.  Grammatical inference as a principal component analysis problem , 2009, ICML '09.

[26]  Pedro García,et al.  IDENTIFYING REGULAR LANGUAGES IN POLYNOMIAL TIME , 1993 .

[27]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[28]  Pierre Dupont,et al.  Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms , 2005, Pattern Recognit..

[29]  Alexander Clark,et al.  PAC-learnability of Probabilistic Deterministic Finite State Automata , 2004, J. Mach. Learn. Res..

[30]  J. C. Jackson Learning Functions Represented as Multiplicity Automata , 1997 .

[31]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[32]  François Denis,et al.  Learning Classes of Probabilistic Automata , 2004, COLT.

[33]  Colin de la Higuera,et al.  Identification in the Limit with Probability One of Stochastic Deterministic Finite Automata , 2000, ICGI.

[34]  Ricard Gavaldà,et al.  Towards Feasible PAC-Learning of Probabilistic Deterministic Finite Automata , 2008, ICGI.

[35]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[36]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[37]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[38]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[39]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[40]  Pedro Javier Garcia,et al.  Identifying Regular Languages In Polynomial , 1992 .

[41]  Ricard Gavaldà Mestre,et al.  Bootstrapping and learning PDFA in data streams , 2012 .

[42]  Franck Thollard Improving Probabilistic Grammatical Inference Core Algorithms with Post-processing Techniques , 2001, ICML.

[43]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[44]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[45]  Steven Rudich,et al.  Inferring the structure of a Markov Chain from its output , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[46]  Jack W. Carlyle,et al.  Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..

[47]  Pierre Dupont,et al.  Smoothing Probabilistic Automata: An Error-Correcting Approach , 2000, ICGI.