Criteria for Classifying Forecasting Methods

Abstract Classifying forecasting methods as being either of a “machine learning” or “statistical” nature has become commonplace in parts of the forecasting literature and community, as exemplified by the M4 competition and the conclusion drawn by the organizers. We argue that this distinction does not stem from fundamental differences in the methods assigned to either class. Instead, this distinction is probably of a tribal nature, which limits the insights into the appropriateness and effectiveness of different forecasting methods. We provide alternative characteristics of forecasting methods which, in our view, allow to draw meaningful conclusions. Further, we discuss areas of forecasting which could benefit most from cross-pollination between the ML and the statistics communities.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[3]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[4]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[5]  A. Kock,et al.  Oracle Inequalities for High Dimensional Vector Autoregressions , 2012, 1311.0811.

[6]  T. Gneiting Quantiles as optimal point forecasts , 2011 .

[7]  Spyros Makridakis,et al.  The M3-Competition: results, conclusions and implications , 2000 .

[8]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[11]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[14]  N. Chan,et al.  Group LASSO for Structural Break Time Series , 2014 .

[15]  Matthias W. Seeger,et al.  Bayesian Intermittent Demand Forecasting for Large Inventories , 2016, NIPS.

[16]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[17]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[18]  Rob J. Hyndman,et al.  FFORMA: Feature-based forecast model averaging , 2020, International Journal of Forecasting.

[19]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[20]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[21]  Rob J Hyndman,et al.  Forecasting with Exponential Smoothing: The State Space Approach , 2008 .

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Franklin M. Fisher,et al.  The identification problem in econometrics , 1967 .

[24]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[25]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[26]  Robert J. Genetski,et al.  Long-Range Forecasting: From Crystal Ball to Computer , 1981 .

[27]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[28]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[29]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[30]  Stephan Kolassa,et al.  A Classification of Business Forecasting Problems , 2019 .

[31]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference: Application to Causal Effects and Other Semiparametric Estimands , 2018, Econometrica.

[32]  Stephan Kolassa Sometimes It's Better to Be Simple than Correct , 2016 .

[33]  Inderjit S. Dhillon,et al.  Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction , 2016, NIPS.

[34]  D. Donoho 50 Years of Data Science , 2017 .

[35]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[36]  Aris A. Syntetos,et al.  The Application of Product-Group Seasonal Indexes to Individual Products , 2012 .

[37]  J. Stock,et al.  Macroeconomic Forecasting Using Diffusion Indexes , 2002 .

[38]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[39]  G. Michailidis,et al.  Regularized estimation in sparse high-dimensional time series models , 2013, 1311.4175.

[40]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[41]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[42]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[43]  S. Kolassa Combining exponential smoothing forecasts using Akaike weights , 2011 .

[44]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[45]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[46]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[47]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[48]  Evangelos Spiliotis,et al.  The M4 Competition: 100,000 time series and 61 forecasting methods , 2020 .

[49]  Rob J Hyndman,et al.  Stochastic models underlying Croston's method for intermittent demand forecasting , 2005 .

[50]  T. Gneiting Making and Evaluating Point Forecasts , 2009, 0912.0902.

[51]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[52]  Kenneth B. Kahn,et al.  The State of New-Product Forecasting , 2018 .

[53]  Joos-Hendrik Böse,et al.  Probabilistic Demand Forecasting at Scale , 2017, Proc. VLDB Endow..

[54]  S. Kolassa,et al.  Advantages of the MAD/Mean ratio over the MAPE , 2007 .

[55]  Rob J. Hyndman,et al.  Boosting multi-step autoregressive forecasts , 2014, ICML.

[56]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[57]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[58]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[59]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[60]  G. Imbens,et al.  Machine Learning for Estimating Heterogeneous Causal Effects , 2015 .

[61]  Jascha Sohl-Dickstein,et al.  Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[62]  Sunita Sarawagi,et al.  ARMDN: Associative and Recurrent Mixture Density Networks for eRetail Demand Forecasting , 2018, ArXiv.

[63]  Robert L. Winkler,et al.  The accuracy of extrapolation (time series) methods: Results of a forecasting competition , 1982 .

[64]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[65]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[66]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[67]  J. Pearl Causal inference in statistics: An overview , 2009 .

[68]  J. D. Croston Forecasting and Stock Control for Intermittent Demands , 1972 .

[69]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[70]  Evangelos Spiliotis,et al.  The M4 Competition: Results, findings, conclusion and way forward , 2018, International Journal of Forecasting.

[71]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[72]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[73]  Matthias W. Seeger,et al.  Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[74]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[75]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[76]  Syama Sundar Rangapuram,et al.  Deep Learning for Forecasting: Current Trends and Challenges , 2018 .

[77]  C. Chatfield,et al.  The M2-competition: A real-time judgmentally based forecasting study , 1993 .

[78]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .