Improving Multi-Step Prediction of Learned Time Series Models

Most typical statistical and machine learning approaches to time series modeling optimize a single-step prediction error. In multiple-step simulation, the learned model is iteratively applied, feeding through the previous output as its new input. Any such predictor however, inevitably introduces errors, and these compounding errors change the input distribution for future prediction steps, breaking the train-test i.i.d assumption common in supervised learning. We present an approach that reuses training data to make a no-regret learner robust to errors made during multi-step prediction. Our insight is to formulate the problem as imitation learning; the training data serves as a "demonstrator" by providing corrections for the errors made during multi-step prediction. By this reduction of multi-step time series prediction to imitation learning, we establish theoretically a strong performance guarantee on the relation between training error and the multi-step prediction error. We present experimental results of our method, DAD, and show significant improvement over the traditional approach in two notably different domains, dynamic system modeling and video texture prediction.

[1]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[4]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[5]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[6]  Liva Ralaivola,et al.  Dynamical Modeling with Kernels for Nonlinear Time Series Prediction , 2003, NIPS.

[7]  Pieter Abbeel,et al.  Learning first-order Markov models for control , 2004, NIPS.

[8]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[9]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[10]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[11]  Pieter Abbeel,et al.  Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[12]  Yoshinobu Kawahara,et al.  A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems , 2006, NIPS.

[13]  Satinder P. Singh,et al.  Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems , 2006, ICML.

[14]  Byron Boots,et al.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems , 2007, NIPS.

[15]  Dieter Fox,et al.  GP-UKF: Unscented kalman filters with Gaussian process prediction and observation models , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[17]  Nuno Vasconcelos,et al.  Classifying Video with Kernel Dynamic Textures , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[19]  K. Fernow New York , 1896, American Potato Journal.

[20]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[21]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[22]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[23]  Mubarak Shah,et al.  Time series prediction by chaotic modeling of nonlinear dynamical systems , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  John Langford,et al.  Learning nonlinear dynamic models , 2009, ICML '09.

[25]  Geoffrey J. Gordon,et al.  No-Regret Reductions for Imitation Learning and Structured Prediction , 2010, ArXiv.

[26]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[27]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[28]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[29]  J. Andrew Bagnell,et al.  Stability Conditions for Online Learnability , 2011, ArXiv.

[30]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.