GFSM: a Feature Selection Method for Improving Time Series Forecasting

Handling time series forecasting with many predictors is a popular topic in the era of "Big data", where wast amounts of observed variables are stored and used in analytic processes. Classical prediction models face some limitations when applied to large-scale data. Using all the existing predictors increases the computational time and does not necessarily improve the forecast accuracy. The challenge is to extract the most relevant predictors contributing to the forecast of each target time series. We propose a causal-feature selection algorithm specific to multiple time series forecasting based on a clustering approach. Experiments are conducted on US and Australia macroeconomic datasets using different prediction models. We compare our method to some widely used dimension reduction and feature selection methods including principal component analysis PCA, Kernel PCA and factor analysis. The proposed algorithm improves the forecast accuracy compared to the evaluated methods on the tested datasets.

[1]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[2]  J. Armstrong Significance Tests Harm Progress in Forecasting , 2007 .

[3]  Randall S. Sexton,et al.  Comparing backpropagation with a genetic algorithm for neural network training , 1999 .

[4]  Mark W. Watson,et al.  Chapter 10 Forecasting with Many Predictors , 2006 .

[5]  Irena Koprinska,et al.  Correlation and instance based feature selection for electricity load forecasting , 2015, Knowl. Based Syst..

[6]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[7]  Dimitris Korobilis,et al.  Hierarchical Shrinkage Priors for Dynamic Regressions with Many Predictors , 2011 .

[8]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[9]  G. Kapetanios,et al.  Forecasting Large Datasets with Bayesian Reduced Rank Multivariate Models , 2009 .

[10]  Lotfi Lakhal,et al.  A Causality Based Feature Selection Approach for Multivariate Time Series Forecasting , 2017, DBKDA 2017.

[11]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[12]  Beatriz de la Iglesia,et al.  Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms , 2006, J. Math. Model. Algorithms.

[13]  M. H. Quenouille,et al.  The analysis of multiple time-series , 1958 .

[14]  Koffka Khan,et al.  A Comparison of BA, GA, PSO, BP and LM for Training Feed forward Neural Networks in e-Learning Context , 2012 .

[15]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[18]  Yongxin Zhu,et al.  Distributed Discord Discovery: Spark Based Anomaly Detection in Time Series , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[19]  Cyrus Shahabi,et al.  Feature Subset Selection on Multivariate Time Series with Extremely Large Spatial Features , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[20]  Alev Dilek Aydin,et al.  Comparison of Prediction Performances of Artificial Neural Network (ANN) and Vector Autoregressive (VAR) Models by Using the Macroeconomic Variables of Gold Prices, Borsa Istanbul (BIST) 100 Index and US Dollar-Turkish Lira (USD/TRY) Exchange Rates , 2015 .

[21]  P. Whittle The Analysis of Multiple Stationary Time Series , 1953 .

[22]  Gilbert T. Walker,et al.  On Periodicity in Series of Related Terms , 1931 .

[23]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[24]  Fotios Petropoulos,et al.  forecast: Forecasting functions for time series and linear models , 2018 .

[25]  Michael P. Clements,et al.  Dynamic Factor Models , 2011, Financial Econometrics.

[26]  Subanar,et al.  Forecasting Performance of VAR-NN and VARMA Models , 2011 .

[27]  Farshid Vahid,et al.  Macroeconomic forecasting for Australia using a large number of predictors , 2019, International Journal of Forecasting.

[28]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[29]  Ajith Abraham,et al.  Hybrid Intelligent Systems for Stock Market Analysis , 2001, International Conference on Computational Science.

[30]  G. Box Box and Jenkins: Time Series Analysis, Forecasting and Control , 2013 .

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Mark W. Watson,et al.  Generalized Shrinkage Methods for Forecasting Using Many Predictors , 2012 .

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[34]  S. Johansen Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models , 1991 .

[35]  Xiao Zhong,et al.  Forecasting daily stock market return using dimensionality reduction , 2017, Expert Syst. Appl..

[36]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[37]  Shouyang Wang,et al.  A causal feature selection algorithm for stock prediction modeling , 2014, Neurocomputing.

[38]  Michel Terraza,et al.  Testing for Causality , 1994 .

[39]  H. Akaike A new look at the statistical model identification , 1974 .

[40]  A. Seth,et al.  Granger causality and transfer entropy are equivalent for Gaussian variables. , 2009, Physical review letters.

[41]  Jiuyong Li,et al.  Using causal discovery for feature selection in multivariate numerical time series , 2015, Machine Learning.