Spectral Learning of Predictive State Representations with Insufficient Statistics

Predictive state representations (PSRs) are models of dynamical systems that represent state as a vector of predictions about future observable events (tests) conditioned on past observed events (histories). If a practitioner selects finite sets of tests and histories that are known to be sufficient to completely capture the system, an exact PSR can be learned in polynomial time using spectral methods. However, most real-world systems are complex, and in practice computational constraints limit us to small sets of tests and histories which are therefore never truly sufficient. How, then, should we choose these sets? Existing theory offers little guidance here, and yet we show that the choice is highly consequential—tests and histories selected at random or by a naive rule significantly underperform the best sets. In this paper we approach the problem both theoretically and empirically. While any fixed system can be represented by an infinite number of equivalent but distinct PSRs, we show that in the computationally unconstrained setting, where existing theory guarantees accurate predictions, the PSRs learned by spectral methods always satisfy a particular spectral bound. Adapting this idea, we propose a simple algorithmic technique to search for sets of tests and histories that approximately satisfy the bound while respecting computational limits. Empirically, our method significantly reduces prediction errors compared to standard spectral learning approaches.

[1]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[2]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[3]  Ariadna Quattoni,et al.  A Spectral Learning Algorithm for Finite State Transducers , 2011, ECML/PKDD.

[4]  Alex Kulesza,et al.  Low-Rank Spectral Learning , 2014, AISTATS.

[5]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[6]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[7]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[8]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[9]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[10]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[11]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[12]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[13]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[14]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[15]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[16]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[17]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .