Effective Feature Preprocessing for Time Series Forecasting

Time series forecasting is an important area in data mining research. Feature preprocessing techniques have significant influence on forecasting accuracy, therefore are essential in a forecasting model. Although several feature preprocessing techniques have been applied in time series forecasting, there is so far no systematic research to study and compare their performance. How to select effective techniques of feature preprocessing in a forecasting model remains a problem. In this paper, the authors conduct a comprehensive study of existing feature preprocessing techniques to evaluate their empirical performance in time series forecasting. It is demonstrated in our experiment that, effective feature preprocessing can significantly enhance forecasting accuracy. This research can be a useful guidance for researchers on effectively selecting feature preprocessing techniques and integrating them with time series forecasting models.

[1]  P. Luh,et al.  Selecting input factors for clusters of Gaussian radial basis function networks to improve market clearing price prediction , 2003 .

[2]  Tetsuya Sakai,et al.  Average gain ratio: a simple retrieval performance measure for evaluation with multiple relevance levels , 2003, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[3]  Zheng Hua,et al.  The ANN of UMCP forecast based on developed ICA , 2004, 2004 IEEE International Conference on Electric Utility Deregulation, Restructuring and Power Technologies. Proceedings.

[4]  L. N. Kanal,et al.  Handbook of Statistics, Vol. 2. Classification, Pattern Recognition and Reduction of Dimensionality. , 1985 .

[5]  Melinda Miller Holt,et al.  Statistics and Data Analysis From Elementary to Intermediate , 2001, Technometrics.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB , 2001 .

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[11]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[13]  A. Tamhane,et al.  Statistics and Data Analysis: From Elementary to Intermediate , 1999 .

[14]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[15]  Thomas Kolarik,et al.  Time series forecasting using neural networks , 1994, APL '94.

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  J. A. Momoh,et al.  Detection and classification of line faults on power distribution systems using neural networks , 1993, Proceedings of 36th Midwest Symposium on Circuits and Systems.

[18]  J. Contreras,et al.  ARIMA models to predict next-day electricity prices , 2002 .

[19]  T. Niimura,et al.  A day-ahead electricity price prediction based on a fuzzy-neuro autoregressive model in a deregulated electricity market , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[20]  Zhao Yang Dong,et al.  An adaptive neural-wavelet model for short term load forecasting , 2001 .

[21]  Vasant G Honavar,et al.  Feature Subset Selection Using a Genetic Algorithm Feature Subset Selection Using a Genetic Algorithm , 1998 .

[22]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[23]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[24]  Margaret Lech,et al.  Discriminative feature extraction applied to speaker identification , 2002, 6th International Conference on Signal Processing, 2002..

[25]  H. Guirguis,et al.  Further Advances in Forecasting Day-Ahead Electricity Prices Using Time Series Models , 2004 .

[26]  E. Oja,et al.  Independent component analysis for financial time series , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[27]  Marija Ilic,et al.  Bid-based Stochastic Model for Electricity Prices: The Impact of Fundamental Drivers on Market Dynamics , 2000 .

[28]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[29]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[30]  Touradj Ebrahimi,et al.  Support vector EEG classification in the Fourier and time-frequency correlation domains , 2003, First International IEEE EMBS Conference on Neural Engineering, 2003. Conference Proceedings..

[31]  Calton Pu,et al.  Research challenges in environmental observation and forecasting systems , 2000, MobiCom '00.

[32]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.