A scalable framework for large time series prediction

Knowledge discovery systems are nowadays supposed to store and process very large data. When working with big time series, multivariate prediction becomes more and more complicated because the use of all the variables does not allow to have the most accurate predictions and poses certain problems for classical prediction models. In this article, we present a scalable prediction process for large time series prediction, including a new algorithm for identifying time series predictors, which analyses the dependencies between time series using the mutual reinforcement principle between Hubs and Authorities of the Hits (Hyperlink-Induced Topic Search) algorithm. The proposed framework is evaluated on 3 real datasets. The results show that the best predictions are obtained using a very small number of predictors compared to the initial number of variables. The proposed feature selection algorithm shows promising results compared to widely known algorithms, such as the classic and the kernel principle component analysis, factor analysis, and the fast correlation-based filter method, and improves the prediction accuracy of many time series of the used datasets.

[1]  Tomaso Aste,et al.  Measures of Causality in Complex Datasets with Application to Financial Data , 2014, Entropy.

[2]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[3]  Mark W. Watson,et al.  Generalized Shrinkage Methods for Forecasting Using Many Predictors , 2012 .

[4]  M. Hallin,et al.  The Generalized Dynamic-Factor Model: Identification and Estimation , 2000, Review of Economics and Statistics.

[5]  H. Akaike A new look at the statistical model identification , 1974 .

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Johan A. K. Suykens,et al.  Transductive Feature Selection Using Clustering-Based Sample Entropy for Temperature Prediction in Weather Forecasting , 2018, Entropy.

[8]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[9]  Z. Cho,et al.  SVD pseudoinversion image reconstruction , 1981 .

[10]  C. Granger Testing for causality: a personal viewpoint , 1980 .

[11]  Xiao Zhong,et al.  Forecasting daily stock market return using dimensionality reduction , 2017, Expert Syst. Appl..

[12]  Irena Koprinska,et al.  Correlation and instance based feature selection for electricity load forecasting , 2015, Knowl. Based Syst..

[13]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[14]  Michele Benzi,et al.  MATRIX FUNCTIONS , 2006 .

[15]  I. Jolliffe Principal Component Analysis and Factor Analysis , 1986 .

[16]  H. Schneeweiß,et al.  Factor Analysis and Principal Components , 1995 .

[17]  J. Nagy,et al.  KRONECKER PRODUCT AND SVD APPROXIMATIONS IN IMAGE RESTORATION , 1998 .

[18]  Farshid Vahid,et al.  Macroeconomic forecasting for Australia using a large number of predictors , 2019, International Journal of Forecasting.

[19]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[20]  Nikolaos Kourentzes,et al.  Feature selection for time series prediction - A combined filter and wrapper approach for neural networks , 2010, Neurocomputing.

[21]  S. Johansen STATISTICAL ANALYSIS OF COINTEGRATION VECTORS , 1988 .

[22]  Mark W. Watson,et al.  Chapter 10 Forecasting with Many Predictors , 2006 .

[23]  Hamidreza Zareipour,et al.  A New Feature Selection Technique for Load and Price Forecast of Electrical Power Systems , 2017, IEEE Transactions on Power Systems.

[24]  Shaohuan Zu,et al.  Structure-oriented singular value decomposition for random noise attenuation of seismic data , 2015 .