A methodology for training set instance selection using mutual information in time series prediction

Training set instance selection is an important preprocessing step in many machine learning problems, including time series prediction, and has to be considered in practice in order to increase the quality of the predictions and possibly reduce training time. Recently, the usage of mutual information (MI) has been proposed in regression tasks, mostly for feature selection and for identifying the real data from data sets that contain noise and outliers. This paper proposes a new methodology for training set instance selection for long-term time series prediction. The proposed methodology combines a recursive prediction strategy and advanced instance selection criterion—the nearest neighbor based MI estimator. An application of the concept of MI is presented for the selection of training instances based on MI computation between initial training set instances and the current forecasting instance, for every prediction step. The novelty of the approach lies in the fact that it fits the initial training subset with the current forecasting instance, and consequently reduces the uncertainty of the prediction. In this way, by selecting instances which share a large amount of MI with the current forecasting instance in every prediction step, error propagation and accumulation can be reduced, both of which are well known shortcomings of the recursive prediction strategy, thus leading to better forecasting quality. Another element which sets this approach apart from others is that it is not proposed as an outlier detector, but for the instance selection of data which do not necessarily have to contain noise and outliers. The results obtained from the data sets from NN5 competition in time series prediction indicate that the proposed method increases the quality of long-term time series prediction, as well as reduces the amount of instances needed for building the model.

[1]  Dario Rossi,et al.  Support vector regression for link load prediction , 2008, 2008 4th International Telecommunication Networking Workshop on QoS in Multiservice IP Networks.

[2]  Hans-Peter Kriegel,et al.  Feature Weighting and Instance Selection for Collaborative Filtering: An Information-Theoretic Approach* , 2003, Knowledge and Information Systems.

[3]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[4]  Leonard J. Tashman,et al.  Out-of-sample tests of forecasting accuracy: an analysis and review , 2000 .

[5]  J. Tolvi,et al.  Genetic algorithms for outlier detection and variable selection in linear regression models , 2004, Soft Comput..

[6]  Nima Amjady,et al.  Short-term hourly load forecasting using time-series modeling with peak load estimation capability , 2001 .

[7]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[8]  Georg Dorffner,et al.  ADAPTIVE MACHINE LEARNING IN DELAYED FEEDBACK DOMAINS BY SELECTIVE RELEARNING , 2008, Appl. Artif. Intell..

[9]  Héctor Pomares,et al.  Effective Input Variable Selection for Function Approximation , 2006, ICANN.

[10]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[11]  Amir F. Atiya,et al.  A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition , 2011, Expert Syst. Appl..

[12]  Nikolaos Kourentzes,et al.  Forecasting high-frequency time series with neural networks - an analysis of modelling challenges from increasing data frequency , 2008 .

[13]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[14]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[15]  David R. Cox,et al.  Time Series Analysis , 2012 .

[16]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[17]  Thomas G. Dietterich,et al.  Pruning Improves Heuristic Search for Cost-Sensitive Learning , 2002, ICML.

[18]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[19]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[20]  Amaury Lendasse,et al.  Methodology for long-term prediction of time series , 2007, Neurocomputing.

[21]  Masashi Sugiyama,et al.  Mixture Regression for Covariate Shift , 2006, NIPS.

[22]  Kyoung-jae Kim,et al.  Financial time series forecasting using support vector machines , 2003, Neurocomputing.

[23]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[24]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[25]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[26]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[27]  Lejla Batina,et al.  Mutual Information Analysis: a Comprehensive Study , 2011, Journal of Cryptology.

[28]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[29]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[30]  Michel Verleysen,et al.  Resampling methods for parameter-free and robust feature selection with mutual information , 2007, Neurocomputing.

[31]  Mei-Ling Shyu,et al.  k-NN based LS-SVM framework for long-term time series prediction , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[32]  Mei-Ling Shyu,et al.  Long-Term Time Series Prediction Using k-NN Based LS-SVM Framework with Multi-Value Integration , 2012 .

[33]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[34]  Nikolaos Kourentzes,et al.  Input-variable specification for Neural Networks - An analysis of forecasting low and high time series frequency , 2009, 2009 International Joint Conference on Neural Networks.

[35]  Ravi Sankar,et al.  Time Series Prediction Using Support Vector Machines: A Survey , 2009, IEEE Computational Intelligence Magazine.

[36]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[38]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[39]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[40]  Nikolaos Kourentzes,et al.  Feature selection for time series prediction - A combined filter and wrapper approach for neural networks , 2010, Neurocomputing.

[41]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[42]  Jianping Zhang,et al.  Intelligent Selection of Instances for Prediction Functions in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[43]  YuKai,et al.  Feature Weighting and Instance Selection for Collaborative Filtering: An Information-Theoretic Approach , 2003 .

[44]  Richard Nock,et al.  Stopping Criterion for Boosting-Based Data Reduction Techniques: from Binary to Multiclass Problem , 2003, J. Mach. Learn. Res..

[45]  T. Hesterberg,et al.  A regression-based approach to short-term system load forecasting , 1989, Conference Papers Power Industry Computer Application Conference.

[46]  Ginés Rubio,et al.  New method for instance or prototype selection using mutual information in time series prediction , 2010, Neurocomputing.

[47]  I. Rojas,et al.  Instance or Prototype Selection for Function Approximation using Mutual Information , 2008 .

[48]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[49]  Hisao Ishibuchi,et al.  Learning of neural networks with GA-based instance selection , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[50]  Alexander Kraskov,et al.  Least-dependent-component analysis based on mutual information. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Michel Verleysen,et al.  Feature selection with missing data using mutual information estimators , 2012, Neurocomputing.

[52]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[53]  Tong Zhang,et al.  Active learning using adaptive resampling , 2000, KDD '00.

[54]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[55]  I. Rojas,et al.  Recursive prediction for long term time series forecasting using advanced models , 2007, Neurocomputing.

[56]  Christine W. Chan,et al.  Multiple neural networks for a long term time series forecast , 2004, Neural Computing & Applications.