A Causality Based Feature Selection Approach for Multivariate Time Series Forecasting

—The field of time series forecasting has progressed significantly in recent decades, specially in regards to the need of forecasting economic data. That said, some issues still arise. In particular when we are working with a set of time series that have a large number of variables. Hence, a selection step is usually needed in order to reduce the number of variables that will contribute to forecast each target time series. In this paper, we propose a feature selection and / or dimension reduction algorithm for forecasting multivariate time series, based on (i) the notion of the Granger causality, and (ii) on a selection step based on a clustering strategy. Finally, we carry out experiments on different real data sets, by comparing our proposal and some of the most used feature selection methods. Experiments show that we improved the forecasting accuracy compared with the evaluated methods.

[1]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[2]  P. Whittle The Analysis of Multiple Stationary Time Series , 1953 .

[3]  J. Stock,et al.  Forecasting with Many Predictors , 2006 .

[4]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  C. Granger Testing for causality: a personal viewpoint , 1980 .

[9]  J. Armstrong Significance Tests Harm Progress in Forecasting , 2007 .

[10]  Clive W. J. Granger,et al.  A Bivariate Causality between Stock Prices and Exchange Rates: Evidence from Recent Asia Flu , 1998 .

[11]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[12]  I. Jolliffe Principal Component Analysis , 2002 .

[13]  Gilbert T. Walker,et al.  On Periodicity in Series of Related Terms , 1931 .

[14]  Efraim Turban,et al.  Business Intelligence: Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures , 2013 .

[15]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[16]  Peter C. M. Molenaar,et al.  A dynamic factor model for the analysis of multivariate time series , 1985 .

[17]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[18]  M. H. Quenouille,et al.  The analysis of multiple time-series , 1958 .

[19]  Xiao Zhong,et al.  Forecasting daily stock market return using dimensionality reduction , 2017, Expert Syst. Appl..

[20]  Xue-wen Chen,et al.  Enhanced recursive feature elimination , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[21]  S. Johansen Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models , 1991 .

[22]  Jiuyong Li,et al.  Using causal discovery for feature selection in multivariate numerical time series , 2015, Machine Learning.

[23]  Bovas Abraham,et al.  Dimensionality reduction approach to multivariate prediction , 2005, Comput. Stat. Data Anal..

[24]  Farshid Vahid,et al.  Macroeconomic forecasting for Australia using a large number of predictors , 2019, International Journal of Forecasting.

[25]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[26]  P. Rousseeuw,et al.  Partitioning Around Medoids (Program PAM) , 2008 .