Discovery of motifs to forecast outlier occurrence in time series

The forecasting process of real-world time series has to deal with especially unexpected values, commonly known as outliers. Outliers in time series can lead to unreliable modeling and poor forecasts. Therefore, the identification of future outlier occurrence is an essential task in time series analysis to reduce the average forecasting error. The main goal of this work is to predict the occurrence of outliers in time series, based on the discovery of motifs. In this sense, motifs will be those pattern sequences preceding certain data marked as anomalous by the proposed metaheuristic in a training set. Once the motifs are discovered, if data to be predicted are preceded by any of them, such data are identified as outliers, and treated separately from the rest of regular data. The forecasting of outlier occurrence has been added as an additional step in an existing time series forecasting algorithm (PSF), which was based on pattern sequence similarities. Robust statistical methods have been used to evaluate the accuracy of the proposed approach regarding the forecasting of both occurrence of outliers and their corresponding values. Finally, the methodology has been tested on six electricity-related time series, in which most of the outliers were properly found and forecasted.

[1]  A. Sharov,et al.  Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[2]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[3]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[4]  Mia Hubert,et al.  Robust statistics for outlier detection , 2011, WIREs Data Mining Knowl. Discov..

[5]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[6]  Wei Wu,et al.  Forecasting Electricity Market Price Spikes Based on Bayesian Expert with Support Vector Machines , 2006, ADMA.

[7]  V. Yohai,et al.  Robust Estimation for ARMA models , 2009, 0904.0106.

[8]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[9]  José Luis Rojo-Álvarez,et al.  Robust gamma-filter using support vector machines , 2004, Neurocomputing.

[10]  J. Ramos,et al.  Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques , 2007, IEEE Transactions on Power Systems.

[11]  Lalit Mohan Saini,et al.  Peak load forecasting using Bayesian regularization, Resilient and adaptive backpropagation learning based artificial neural networks , 2008 .

[12]  A. Conejo,et al.  Multimarket optimal bidding for a power producer , 2005, IEEE Transactions on Power Systems.

[13]  Alberto Gómez,et al.  Forecasting next-day price of electricity in the Spanish energy market using artificial neural networks , 2008, Eng. Appl. Artif. Intell..

[14]  Christophe Croux,et al.  Robust Forecasting with Exponential and Holt-Winters Smoothing , 2007 .

[15]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[16]  Haizhou Li,et al.  A tree-construction search approach for multivariate time series motifs discovery , 2010, Pattern Recognit. Lett..

[17]  D. Goodin The cambridge dictionary of statistics , 1999 .

[18]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[19]  Mariano J. Valderrama,et al.  An overview to modelling functional data , 2007, Comput. Stat..

[20]  R. Weron,et al.  Forecasting spot electricity prices: A comparison of parametric and semiparametric time series models , 2008 .

[21]  C. García-Martos,et al.  Mixed Models for Short-Run Forecasting of Electricity Prices: Application for the Spanish Market , 2007, IEEE Transactions on Power Systems.

[22]  Stephen Shaoyi Liao,et al.  Discovering original motifs with different lengths from time series , 2008, Knowl. Based Syst..

[23]  Z. Dong,et al.  Electricity market price spike forecast with data mining techniques , 2005 .

[24]  Rob J Hyndman,et al.  Density Forecasting for Long-Term Peak Electricity Demand , 2010, IEEE Transactions on Power Systems.

[25]  Zne-Jung Lee,et al.  Hybrid robust support vector machines for regression with outliers , 2011, Appl. Soft Comput..

[26]  Christophe Croux,et al.  Robust exponential smoothing of multivariate time series , 2010, Comput. Stat. Data Anal..

[27]  Zuhaimy Ismail,et al.  Forecasting Peak Load Electricity Demand Using Statistics and Rule Based Approach , 2009 .

[28]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[29]  Daniel Peña,et al.  Effects of outliers on the identification and estimation of GARCH models , 2007 .

[30]  Ashwani Kumar,et al.  Electricity price forecasting in deregulated markets: A review and evaluation , 2009 .

[31]  Daisuke Kihara,et al.  EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences , 2006, BMC Bioinformatics.

[32]  Bernhard Sick,et al.  On-line motif detection in time series with SwiftMotif , 2009, Pattern Recognit..

[33]  Junhua Zhao,et al.  A Framework for Electricity Price Spike Analysis With Advanced Data Mining Methods , 2007, IEEE Transactions on Power Systems.

[34]  Mohammed E. El-Telbany,et al.  Short-term forecasting of Jordanian electricity demand using particle swarm optimization , 2008 .

[35]  Siu-Ming Yiu,et al.  Detection of generic spaced motifs using submotif pattern mining , 2007, Bioinform..

[36]  Alicia Troncoso Lora,et al.  Time-Series Prediction: Application to the Short-Term Electric Energy Demand , 2003, CAEPIA.

[37]  Francisco Martinez Alvarez,et al.  Energy Time Series Forecasting Based on Pattern Sequence Similarity , 2011, IEEE Transactions on Knowledge and Data Engineering.

[38]  Jing-Min Wang,et al.  A new method for short-term electricity load forecasting , 2008 .

[39]  Christophe Croux,et al.  TOMCAT: A MATLAB toolbox for multivariate calibration techniques , 2007 .

[40]  D. Berry,et al.  Statistics: Theory and Methods , 1990 .

[41]  Ghassan Halasa,et al.  Short-Term and Medium-Term Load Forecasting for Jordan's Power System , 2008 .

[42]  R. Tsay,et al.  Outlier Detection in Multivariate Time Series by Projection Pursuit , 2006 .