Predictive State Recurrent Neural Networks

We present a new model, called Predictive State Recurrent Neural Networks (PSRNNs), for filtering and prediction in dynamical systems. PSRNNs draw on insights from both Recurrent Neural Networks (RNNs) and Predictive State Representations (PSRs), and inherit advantages from both types of models. Like many successful RNN architectures, PSRNNs use (potentially deeply composed) bilinear transfer functions to combine information from multiple sources, so that one source can act as a gate for another. These bilinear functions arise naturally from the connection to state updates in Bayes filters like PSRs, in which observations can be viewed as gating belief states. We show that PSRNNs can be learned effectively by combining backpropogation through time (BPTT) with an initialization based on a statistically consistent learning algorithm for PSRs called two-stage regression (2SR). We also show that PSRNNs can be can be factorized using tensor decomposition, reducing model size and suggesting interesting theoretical connections to existing multiplicative architectures such as LSTMs. We applied PSRNNs to 4 datasets, and showed that we outperform several popular alternative approaches to modeling dynamical systems in all cases.

[1]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[4]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[5]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Bart De Moor,et al.  Numerical algorithms for state space subspace system identification , 1993 .

[8]  Bart De Moor,et al.  N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems , 1994, Autom..

[9]  Hiroshi Mamitsuka Learning Hidden Markov Models Using Back-Propagation through Time , 1995 .

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[12]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[13]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[14]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[15]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[16]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[17]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[18]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[19]  James Martens,et al.  Learning the Linear Dynamical System with ASOS , 2010, ICML.

[20]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[21]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[22]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[24]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[25]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[26]  Byron Boots,et al.  Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[27]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[28]  Alessandro Sperduti,et al.  A HMM-based pre-training approach for sequential data , 2014, ESANN.

[29]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[30]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[31]  Le Song,et al.  Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Method , 2015, UAI.

[32]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[33]  Sham M. Kakade,et al.  A Linear Dynamical System Model for Text , 2015, ICML.

[34]  Geoffrey J. Gordon,et al.  Supervised Learning for Dynamical System Learning , 2015, NIPS.

[35]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[36]  Sergey Levine,et al.  Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[37]  Geoffrey J. Gordon,et al.  Practical Learning of Predictive State Representations , 2017, ArXiv.