论文信息 - Predictive-State Decoders: Encoding the Future into Recurrent Networks - 字舞流文

Predictive-State Decoders: Encoding the Future into Recurrent Networks

Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. Predictive-State Decoders are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.

Byron Boots | Martial Hebert | J. Andrew Bagnell | Nicholas Rhinehart | Kris M. Kitani | Lerrel Pinto | Arun Venkatraman | Wen Sun | M. Hebert | J. Bagnell | Arun Venkatraman | Byron Boots | Nicholas Rhinehart | Lerrel Pinto | Wen Sun

[1] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[2] Geoffrey E. Hinton,et al. Training Recurrent Neural Networks , 2013 .

[3] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[4] Pieter Abbeel,et al. Learning first-order Markov models for control , 2004, NIPS.

[5] Jitendra Malik,et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[6] John Langford,et al. Learning nonlinear dynamic models , 2009, ICML '09.

[7] Sebastian Thrun,et al. Discriminative Training of Kalman Filters , 2005, Robotics: Science and Systems.

[8] Iasonas Kokkinos,et al. UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[10] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[11] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[12] Thomas,et al. Undergraduate Students , 2001 .

[13] Zoubin Ghahramani,et al. Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[14] M. Hoagland,et al. Feedback Systems An Introduction for Scientists and Engineers SECOND EDITION , 2015 .

[15] Sergey Levine,et al. Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[16] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.

[17] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[18] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[19] Byron Boots,et al. Learning to Filter with Predictive State Inference Machines , 2015, ICML.

[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[21] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[22] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[23] Martial Hebert,et al. Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[24] Bart De Moor,et al. Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[25] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[26] Nicholas Roy,et al. CELLO-EM: Adaptive sensor models without ground truth , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] Satinder P. Singh,et al. Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems , 2006, ICML.

[28] Byron Boots,et al. Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[29] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[30] Dean Alderucci. A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[31] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[32] Liva Ralaivola,et al. Dynamical Modeling with Kernels for Nonlinear Time Series Prediction , 2003, NIPS.

[33] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[34] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[35] Mohamed Chtourou,et al. On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[36] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[37] Ingmar Posner,et al. Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks , 2016, AAAI.

[38] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[39] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[40] Geoffrey J. Gordon,et al. Supervised Learning for Dynamical System Learning , 2015, NIPS.

[41] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[42] Byron Boots,et al. Inference Machines for Nonparametric Filter Learning , 2016, IJCAI.

[43] Dieter Fox,et al. GP-UKF: Unscented kalman filters with Gaussian process prediction and observation models , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44] Byron Boots,et al. Spectral Approaches to Learning Predictive Representations , 2011 .

[45] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[46] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[47] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[48] Uwe D. Hanebeck,et al. Analytic moment-based Gaussian process filtering , 2009, ICML '09.

[49] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[50] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[51] Byron Boots,et al. Learning to Smooth with Bidirectional Predictive State Inference Machines , 2016, UAI.

[52] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[53] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[54] Le Song,et al. Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[55] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[56] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.