Machine Learning Advances for Time Series Forecasting

In this paper we survey the most recent advances in supervised machine learning and high-dimensional models for time series forecasting. We consider both linear and nonlinear alternatives. Among the linear methods we pay special attention to penalized regressions and ensemble of models. The nonlinear methods considered in the paper include shallow and deep neural networks, in their feed-forward and recurrent versions, and tree-based methods, such as random forests and boosted trees. We also consider ensemble and hybrid models by combining ingredients from different alternatives. Tests for superior predictive ability are briefly reviewed. Finally, we discuss application of machine learning in economics and finance and provide an illustration with high-frequency financial data.

[1]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[2]  Diverging Tests of Equal Predictive Ability , 2020, Econometrica.

[3]  C. Stein Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution , 1956 .

[4]  Mao Ye,et al.  Sparse Signals in the Cross-Section of Returns , 2017, The Journal of Finance.

[5]  M. Medeiros,et al.  The Benefits of Bagging for Forecast Models of Realized Volatility , 2010 .

[6]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[7]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[8]  Zhentao Shi,et al.  On LASSO for predictive regression , 2018, Journal of Econometrics.

[9]  Kam Chung Wong,et al.  LASSO GUARANTEES FOR β-MIXING HEAVY TAILED TIME SERIES ∗ By , 2019 .

[10]  A. Kock,et al.  Oracle Inequalities for High Dimensional Vector Autoregressions , 2012, 1311.0811.

[11]  Marcelo C. Medeiros,et al.  Local Global Neural Networks , 2004 .

[12]  Stephan Smeekes,et al.  An automated approach towards sparse single-equation cointegration modelling , 2018, Journal of Econometrics.

[13]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[14]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[15]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[16]  A. Timmermann,et al.  Economic Forecasting , 2007 .

[17]  Yujie Xue,et al.  Modified LASSO estimators for time series regression models with dependent disturbances , 2020 .

[18]  Chih-Ling Tsai,et al.  Regression coefficient and autoregressive order shrinkage and selection via the lasso , 2007 .

[19]  Lasso Inference for High-Dimensional Time Series , 2020, 2007.10952.

[20]  Marcelo C. Medeiros,et al.  Modeling and Forecasting Large Realized Covariance Matrices and Portfolio Choice , 2017 .

[21]  Alain Hecq,et al.  Granger Causality Testing in High-Dimensional VARs: a Post-Double-Selection Procedure , 2019, 1902.10991.

[22]  Irina Gaynanova,et al.  Oracle inequalities for high-dimensional prediction , 2016, Bernoulli.

[23]  Chris Hans Bayesian lasso regression , 2009 .

[24]  Marcelo C. Medeiros,et al.  Forecasting macroeconomic variables in data-rich environments , 2016 .

[25]  Yoshimasa Uematsu,et al.  High‐dimensional macroeconomic forecasting and variable selection via penalized regression , 2019, The Econometrics Journal.

[26]  Clive W. J. Granger,et al.  Chapter 2 Forecasting and Decision Theory , 2006 .

[27]  Taewook Lee,et al.  Penalized regression models with autoregressive error terms , 2013 .

[28]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[29]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[30]  Eduardo F. Mendes,et al.  ℓ1-regularization of high-dimensional time-series models with non-Gaussian and heteroskedastic errors , 2016 .

[31]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[32]  Laurent A.F. Callot,et al.  Oracle Efficient Estimation and Forecasting with the Adaptive Lasso and the Adaptive Group Lasso in Vector Autoregressions , 2012 .

[33]  G. Michailidis,et al.  Regularized estimation in sparse high-dimensional time series models , 2013, 1311.4175.

[34]  Xuening Zhu Nonconcave penalized estimation in sparse vector autoregression model , 2020, Electronic Journal of Statistics.

[35]  V. Genrea,et al.  Combining expert forecasts : Can anything beat the simple average ? , 2012 .

[36]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[37]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[38]  Fulvio Corsi,et al.  A Simple Approximate Long-Memory Model of Realized Volatility , 2008 .

[39]  Saeed Heravi,et al.  Linear versus neural network forecasts for European industrial production series , 2004 .

[40]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[41]  Xiaohong Chen Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .

[42]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[43]  Peter Reinhard Hansen,et al.  The Model Confidence Set , 2010 .

[44]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[45]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[46]  Norman R. Swanson,et al.  Forecasting economic time series using flexible versus fixed specification and linear versus nonlinear econometric models , 1997 .

[47]  B. Koo,et al.  High-dimensional predictive regression in the presence of cointegration , 2020 .

[48]  F. Audrino,et al.  Lassoing the HAR Model: A Model Selection Perspective on Realized Volatility Dynamics , 2013 .

[49]  Kashif Yousuf Variable Screening for High Dimensional Time Series , 2017, 1705.07950.

[50]  Xinsheng Zhang,et al.  Subset selection for vector autoregressive processes via adaptive Lasso , 2010 .

[51]  Nan-Jung Hsu,et al.  Subset selection for vector autoregressive processes using Lasso , 2008, Comput. Stat. Data Anal..

[52]  J. Keith Ord,et al.  Automatic neural network modeling for univariate time series , 2000 .

[53]  Stephan Smeekes,et al.  Macroeconomic forecasting using penalized regression methods , 2018, International Journal of Forecasting.

[54]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[55]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[56]  Norman R. Swanson,et al.  A Model-Selection Approach to Assessing the Information in the Term Structure Using Linear Models and Artificial Neural Networks , 1995 .

[57]  Anders Bredahl Kock,et al.  Forecasting performances of three automated modelling techniques during the economic crisis 2007–2009 , 2014 .

[58]  C. Granger,et al.  Forecasting and Decision Theory , 2006 .

[59]  F. Ziegelmann,et al.  LASSO‐Type Penalties for Covariate Selection and Forecasting in Time Series , 2016 .

[60]  Zhipeng Liao,et al.  Conditional Superior Predictive Ability , 2020, The Review of Economic Studies.

[61]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[62]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[63]  H. Zou,et al.  STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.

[64]  Greg Tkacz Neural network forecasting of Canadian GDP growth , 2001 .

[65]  Piotr Cofta,et al.  The Model of Confidence , 2007 .

[66]  R. Tsay,et al.  High-dimensional Linear Regression for Dependent Data with Applications to Nowcasting , 2017, Statistica Sinica.

[67]  M. Medeiros,et al.  A multiple regime smooth transition Heterogeneous Autoregressive model for long memory and asymmetries , 2008 .

[68]  David E. Rapach,et al.  Now- and Backcasting Initial Claims with High-Dimensional Daily Internet Search-Volume Data , 2020 .

[69]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[70]  Erik Christian Montes Schütte,et al.  In Search of a Job: Forecasting Employment Growth Using Google Trends , 2019, Journal of Business & Economic Statistics.

[71]  Tomaso A. Poggio,et al.  When and Why Are Deep Networks Better Than Shallow Ones? , 2017, AAAI.

[72]  Halbert White,et al.  Tests of Conditional Predictive Ability , 2003 .

[73]  Andrii Babii,et al.  Inference for High-Dimensional Regressions With Heteroskedasticity and Auto-correlation , 2019 .

[74]  T. Teräsvirta Specification, Estimation, and Evaluation of Smooth Transition Autoregressive Models , 1994 .

[75]  Anders Bredahl Kock,et al.  Forecasting Macroeconomic Variables Using Neural Network Models and Three Automated Model Selection Techniques , 2016 .

[76]  Fumitake Sakaori,et al.  Lag weighted lasso for time series model , 2013, Comput. Stat..

[77]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[78]  C. E. Pedreira,et al.  Local-global neural networks: a new approach for nonlinear time series modelling , 2003 .

[79]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[80]  Anders Bredahl Kock,et al.  Forecasting performance of three automated modelling techniques during the economic crisis 2007-2009 , 2011 .

[81]  Norman R. Swanson,et al.  A Model Selection Approach to Real-Time Macroeconomic Forecasting Using Linear Models and Artificial Neural Networks , 1997, Review of Economics and Statistics.

[82]  Francis X. Diebold,et al.  Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold–Mariano Tests , 2012 .

[83]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[84]  Yan Sun,et al.  Simultaneous sparse model selection and coefficient estimation for heavy-tailed autoregressive processes , 2011, 1112.2682.

[85]  Marcelo C. Medeiros,et al.  Modeling exchange rates: smooth transitions, neural networks, and linear models , 2001, IEEE Trans. Neural Networks.

[86]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[87]  Lihu Xu,et al.  Lasso for sparse linear regression with exponentially β-mixing errors , 2017 .

[88]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[89]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[90]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[91]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[92]  Robert Tibshirani,et al.  Post-selection adaptive inference for Least Angle Regression and the Lasso , 2014 .

[93]  Timo Teräsvirta,et al.  Modelling nonlinear economic time series , 2010 .

[94]  Kurt Hornik,et al.  Stationary and Integrated Autoregressive Neural Network Processes , 2000, Neural Computation.

[95]  Maxime Leroux,et al.  How is Machine Learning Useful for Macroeconomic Forecasting? , 2019, Journal of Applied Econometrics.

[96]  M. Medeiros,et al.  Asymmetric effects and long memory in the volatility of Dow Jones stocks , 2009 .

[97]  Y. Nardi,et al.  Autoregressive process modeling via the Lasso procedure , 2008, J. Multivar. Anal..

[98]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[99]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[100]  Marcelo C. Medeiros,et al.  A flexible coefficient smooth transition time series model , 2005, IEEE Transactions on Neural Networks.

[101]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[102]  Allan Timmermann,et al.  Complete subset regressions , 2013 .

[103]  W. Wu,et al.  Nonlinear system theory: another look at dependence. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[104]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.

[105]  H. White,et al.  A Reality Check for Data Snooping , 2000 .

[106]  Kam Chung Wong,et al.  Lasso guarantees for $\beta$-mixing heavy-tailed time series , 2017, 1708.01505.

[107]  Bryan T. Kelly,et al.  Empirical Asset Pricing Via Machine Learning , 2018, The Review of Financial Studies.

[108]  Jeffrey S. Racine,et al.  Semiparametric ARX neural-network models with an application to forecasting inflation , 2001, IEEE Trans. Neural Networks.

[109]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[110]  M. Medeiros,et al.  Building Neural Network Models for Time Series: A Statistical Approach , 2002 .

[111]  Ricardo P. Masini,et al.  Regularized estimation of high‐dimensional vector autoregressions with weakly dependent innovations , 2019, Journal of Time Series Analysis.

[112]  B. Christensen,et al.  Targeting predictors in random forest regression , 2020, International Journal of Forecasting.

[113]  Eduardo F. Mendes,et al.  Adaptive LASSO estimation for ARDL models with GARCH innovations , 2015 .

[114]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[115]  David Ardia,et al.  Questioning the News About Economic Growth: Sparse Forecasting Using Thousands of News-Based Sentiment Values , 2017, International Journal of Forecasting.

[116]  Eric Ghysels,et al.  Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios , 2020, SSRN Electronic Journal.

[117]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[118]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[119]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[120]  Yi Yang,et al.  A fast unified algorithm for solving group-lasso penalize learning problems , 2014, Statistics and Computing.

[121]  Artur Tarassow,et al.  Forecasting U.S. money growth using economic uncertainty measures and regularisation techniques , 2019, International Journal of Forecasting.

[122]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[123]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[124]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[125]  Arindam Banerjee,et al.  Estimating Structured Vector Autoregressive Models , 2016, ICML.

[126]  Allan Timmermann,et al.  Complete subset regressions with large-dimensional sets of predictors , 2015 .

[127]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[128]  B. Kelly,et al.  Empirical Asset Pricing Via Machine Learning , 2018, The Review of Financial Studies.

[129]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[130]  Kung-Sik Chan,et al.  Subset ARMA selection via the adaptive Lasso , 2011 .

[131]  Eric Hillebrand,et al.  ASYMMETRIES , BREAKS , AND LONG-RANGE DEPENDENCE : AN ESTIMATION FRAMEWORK FOR TIME SERIES OF DAILY REALIZED VOLATILITY , 2008 .

[132]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[133]  Marcelo C. Medeiros,et al.  Real-time inflation forecasting with high-dimensional models: The case of Brazil , 2017 .

[134]  Yongdai Kim,et al.  Smoothly Clipped Absolute Deviation on High Dimensions , 2008 .

[135]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference , 2018, Econometrica.

[136]  Michael McAleer,et al.  Forecasting Realized Volatility with Linear and Nonlinear Univariate Models , 2011 .

[137]  Minchul Shin,et al.  On the Aggregation of Probability Assessments: Regularized Mixtures of Predictive Densities for Eurozone Inflation and Real Interest Rates , 2021 .

[138]  Paul Newbold,et al.  Testing the equality of prediction mean squared errors , 1997 .

[139]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[140]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[141]  L. Kilian,et al.  How Useful Is Bagging in Forecasting Economic Time Series? A Case Study of U.S. Consumer Price Inflation , 2008 .

[142]  M. Medeiros,et al.  Linear models, smooth transition autoregressions, and neural networks for forecasting macroeconomic time series: A re-examination , 2005 .

[143]  Christian Hansen,et al.  Double/Debiased/Neyman Machine Learning of Treatment Effects , 2017, 1701.08687.

[144]  A. Timmermann,et al.  Combining expert forecasts: Can anything beat the simple average? , 2013 .

[145]  Xiaotong Shen,et al.  Sieve extremum estimates for weakly dependent data , 1998 .

[146]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[147]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[148]  Marcelo C. Medeiros,et al.  Nonlinearity, Breaks, and Long-Range Dependence in Time-Series Models , 2016 .

[149]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[150]  Allan Timmermann,et al.  Forecasting in Economics and Finance , 2016 .

[151]  Francis X. Diebold,et al.  Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and its Derivatives , 2018, International Journal of Forecasting.

[152]  Marcelo C. Medeiros,et al.  Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods , 2021 .

[153]  Y. Wu,et al.  Performance bounds for parameter estimates of high-dimensional linear models with correlated errors , 2016 .

[154]  Peter Reinhard Hansen A Test for Superior Predictive Ability , 2005 .

[155]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[156]  Richard A. Davis,et al.  Modeling of time series using random forests: theoretical developments , 2020, Electronic Journal of Statistics.

[157]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[158]  Peter L. Bartlett,et al.  AdaBoost is Consistent , 2006, J. Mach. Learn. Res..