Selection of significant input variables for time series forecasting

Appropriate selection of inputs for time series forecasting models is important because it not only has the potential to improve performance of forecasting models, but also helps reducing cost in data collection. This paper presents an investigation of selection performance of three input selection techniques, which include two model-free techniques, partial linear correlation (PLC) and partial mutual information (PMI) and a model-based technique based on genetic programming (GP). Four hypothetical datasets and two real datasets were used to demonstrate the performance of the three techniques. The results suggested that the model-free PLC technique due to its computational simplicity and the model-based GP technique due to its ability to detect non-linear relationships (demonstrated by its relatively good performance on a hypothetical complex non-linear dataset) are recommended for the input selection task. Candidate inputs which are selected by both these recommended techniques should be considered as significant inputs. Comparative evaluation of two model-free and a model-based input selection method.Four synthetic and two real datasets are used for the comparative evaluation.Model-free techniques: partial linear correlation (PLC) and partial mutual information.Model-based technique based on genetic programming (GP).Inputs selected by both PLC and GP are recommended as the significant inputs.

[1]  Ruey S. Tsay,et al.  Analysis of Financial Time Series , 2005 .

[2]  Aytac Guven,et al.  Gene Expression Programing for Estimating Suspended Sediment Yield in Middle Euphrates Basin, Turkey , 2010 .

[3]  Panagiotis Patrinos,et al.  A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models , 2005 .

[4]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[5]  Vitor Hugo Ferreira,et al.  Input space to neural network based load forecasters , 2008 .

[6]  J. Abbot,et al.  Input selection and optimisation for monthly rainfall forecasting in Queensland, Australia, using artificial neural networks , 2014 .

[7]  Luigi Piroddi,et al.  Jordan recurrent neural network versus IHACRES in modelling daily streamflows , 2008 .

[8]  T. Cacoullos Estimation of a multivariate density , 1966 .

[9]  Chuntian Cheng,et al.  A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series , 2009 .

[10]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[11]  K. Chau,et al.  Neural network and genetic programming for modelling coastal algal blooms , 2006 .

[12]  M. Keijzer,et al.  Genetic programming as a model induction engine , 2000 .

[13]  Steven C. Wheelwright,et al.  Forecasting methods and applications. , 1979 .

[14]  Holger R. Maier,et al.  Non-linear variable selection for artificial neural networks using partial mutual information , 2008, Environ. Model. Softw..

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  Nitin Muttil,et al.  Machine-learning paradigms for selecting ecologically significant input variables , 2007, Eng. Appl. Artif. Intell..

[17]  Holger R. Maier,et al.  Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach , 2009 .

[18]  Ashish Sharma,et al.  Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 — A strategy for system predictor identification , 2000 .

[19]  A. Castelletti,et al.  Tree‐based iterative input variable selection for hydrological modeling , 2013 .

[20]  Silja Meyer-Nieberg,et al.  Electric load forecasting methods: Tools for decision making , 2009, Eur. J. Oper. Res..

[21]  Andrea Castelletti,et al.  A framework for coupling explanation and prediction in hydroecological modelling , 2014, Environ. Model. Softw..

[22]  Kang Li,et al.  Neural input selection - A fast model-based approach , 2007, Neurocomputing.

[23]  Peter C Austin,et al.  Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. , 2004, Journal of clinical epidemiology.

[24]  Andrea Castelletti,et al.  An evaluation framework for input variable selection algorithms for environmental data-driven models , 2014, Environ. Model. Softw..

[25]  J. Brezmes,et al.  Building parsimonious fuzzy ARTMAP models by variable selection with a cascaded genetic algorithm: application to multisensor systems for gas analysis , 2004 .