Links between multiplicity automata, observable operator models and predictive state representations: a unified learning framework

Stochastic multiplicity automata (SMA) are weighted finite automata that generalize probabilistic automata. They have been used in the context of probabilistic grammatical inference. Observable operator models (OOMs) are a generalization of hidden Markov models, which in turn are models for discrete-valued stochastic processes and are used ubiquitously in the context of speech recognition and bio-sequence modeling. Predictive state representations (PSRs) extend OOMs to stochastic input-output systems and are employed in the context of agent modeling and planning. We present SMA, OOMs, and PSRs under the common framework of sequential systems, which are an algebraic characterization of multiplicity automata, and examine the precise relationships between them. Furthermore, we establish a unified approach to learning such models from data. Many of the learning algorithms that have been proposed can be understood as variations of this basic learning scheme, and several turn out to be closely related to each other, or even equivalent.

[1]  Satinder P. Singh,et al.  Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems , 2006, ICML.

[2]  Vishal Soni,et al.  Relational Knowledge with Predictive State Representations , 2007, IJCAI.

[3]  Francesco Bergadano,et al.  Learning Sat-k-DNF formulas from membership queries , 1996, STOC '96.

[4]  I. Markovsky,et al.  Left vs right representations for solving weighted low-rank approximation problems , 2007 .

[5]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[6]  P. Hansen Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion , 1987 .

[7]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[8]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[9]  Michael L. Littman,et al.  Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[10]  Mehryar Mohri,et al.  Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[11]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[12]  M. Droste,et al.  Handbook of Weighted Automata , 2009 .

[13]  Shun-ichi Amari,et al.  Identifiability of hidden Markov information sources and their minimum degrees of freedom , 1992, IEEE Trans. Inf. Theory.

[14]  François Denis,et al.  Learning Classes of Probabilistic Automata , 2004, COLT.

[15]  Satinder P. Singh,et al.  Predictive state representations with options , 2006, ICML.

[16]  James P. Crutchfield,et al.  Quantum automata and quantum grammars , 2000, Theor. Comput. Sci..

[17]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[18]  Amaury Habrard,et al.  Learning Rational Stochastic Languages , 2006, COLT.

[19]  Herbert Jaeger,et al.  Efficient Estimation of OOMs , 2005, NIPS.

[20]  Joelle Pineau,et al.  Modelling Sparse Dynamical Systems with Compressed Predictive State Representations , 2013, ICML.

[21]  Michael R. James,et al.  Planning in Models that Combine Memory with Predictive Representations of State , 2005, AAAI.

[22]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[23]  Liva Ralaivola,et al.  Grammatical inference as a principal component analysis problem , 2009, ICML '09.

[24]  Raphaël Bailly Quadratic Weighted Automata: Spectral Algorithm and Likelihood Maximization , 2011, ACML 2011.

[25]  D. Angluin Queries and Concept Learning , 1988 .

[26]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[27]  Pierre Dupont,et al.  Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms , 2005, Pattern Recognit..

[28]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[29]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[30]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[31]  Daniel Ray Upper,et al.  Theory and algorithms for hidden Markov models and generalized hidden Markov models , 1998 .

[32]  Herbert Jaeger,et al.  Making the Error-Controlling Algorithm of Observable Operator Models Constructive , 2009, Neural Computation.

[33]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[34]  Mehryar Mohri,et al.  Context-Free Recognition with Weighted Automata , 2000, Grammars.

[35]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[36]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[37]  Arto Salomaa,et al.  Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[38]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[41]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[42]  D. Blackwell,et al.  On the Identifiability Problem for Functions of Finite Markov Chains , 1957 .

[43]  Satinder P. Singh,et al.  Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems , 2006, AAAI.

[44]  Byron Boots,et al.  Predictive State Temporal Difference Learning , 2010, NIPS.

[45]  Jack W. Carlyle,et al.  Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..

[46]  John Watrous,et al.  On the power of quantum finite state automata , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[47]  Satinder P. Singh,et al.  Predictive Linear-Gaussian Models of Stochastic Dynamical Systems , 2005, UAI.

[48]  Sabine Van Huffel,et al.  Overview of total least-squares methods , 2007, Signal Process..

[49]  Ariadna Quattoni,et al.  Unsupervised spectral learning of FSTs , 2013, NIPS 2013.

[50]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[51]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[52]  Robert M. Gray,et al.  Probability, Random Processes, And Ergodic Properties , 1987 .

[53]  Eric Wiewiora,et al.  Modeling probability distributions with predictive state representations , 2007 .

[54]  Satinder P. Singh,et al.  Predictive linear-Gaussian models of controlled stochastic dynamical systems , 2006, ICML.

[55]  H. Jaeger Discrete-time, discrete-valued observable operator models: a tutorial , 2003 .

[56]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[57]  Ariadna Quattoni,et al.  Unsupervised Spectral Learning of Finite State Transducers , 2013, NIPS.

[58]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[59]  Satinder P. Singh,et al.  A Nonlinear Predictive State Representation , 2003, NIPS.

[60]  Satinder P. Singh,et al.  Exponential Family Predictive Representations of State , 2007, NIPS.

[61]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[62]  Ariadna Quattoni,et al.  Spectral Learning of Sequence Taggers over Continuous Sequences , 2013, ECML/PKDD.

[63]  Ariadna Quattoni,et al.  Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[64]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[65]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[66]  François Denis,et al.  Rational stochastic languages , 2006, ArXiv.

[67]  Francesco Bergadano,et al.  Learning Behaviors of Automata from Multiplicity and Equivalence Queries , 1994, SIAM J. Comput..

[68]  Ariadna Quattoni,et al.  A Spectral Learning Algorithm for Finite State Transducers , 2011, ECML/PKDD.

[69]  Eyal Kushilevitz,et al.  Learning functions represented as multiplicity automata , 2000, JACM.

[70]  Herbert Jaeger,et al.  Norm-Observable Operator Models , 2010, Neural Computation.

[71]  Tadao Kasami,et al.  A Polynomial Time Learning Algorithm for Recognizable Series , 1994 .

[72]  Marcel Paul Schützenberger,et al.  On the Definition of a Family of Automata , 1961, Inf. Control..

[73]  D. Popovici,et al.  Learning observable operator models via the ES algorithm 1 , 2005 .

[74]  Jean Berstel,et al.  Rational series and their languages , 1988, EATCS monographs on theoretical computer science.

[75]  伊藤 尚史 An algebraic study on discrete stochastic systems , 1992 .

[76]  H. Jaeger Modeling and learning continuous-valued stochastic processes with OOMs , 2000 .

[77]  Eyal Kushilevitz,et al.  On the applications of multiplicity automata in learning , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[78]  Byron Boots,et al.  Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[79]  A. Heller On Stochastic Processes Derived From Markov Chains , 1965 .

[80]  Herbert Jaeger,et al.  A Bound on Modeling Error in Observable Operator Models and an Associated Learning Algorithm , 2009, Neural Computation.

[81]  Satinder P. Singh,et al.  Efficiently learning linear-linear exponential family predictive representations of state , 2008, ICML '08.