A novel Bayesian and Chain Rule Model on symbolic representation for time series classification

Time series data are generated in abundance of information technology applications. Classification of time series is one of the significant areas of interest in Time Series Data Mining. Over the last decade, different approaches have been developed to achieve a superior effectiveness and efficiency in classification of time series. We propose a new model for classification of time series which first uses SAX (symbolic aggregate approximation) method to transform the time series to symbols and then applies a probabilistic methodology in dealing with classification of symbolic sequences. This model is developed based on Bayesian rule and Probability Chain rule on symbolic representation of time series along with considering a penalty for mismatching points. We call this model Bayesian and Chain rule Model (BCM). The performance of BCM has been evaluated using 43 time series data sets. We performed an extensive experiment to compare the performance of BCM to Euclidean Distance, Dynamic Time Warping (DTW) with best warping window, and DTW with no warping window. This comparison was done through three stages. Stage 1 was predicting the domains and problems, in which BCM works relatively well. Stage 2 included measuring the prediction accuracy of BCM to those of the other methods. Stage 3 involved constructing the confusion matrixes for predictions in Stage 1 as well as measuring and comparing the predictability of the methods. The results show that, compared to the other popular time series classification methods, BCM is more accurate where it is predicted to perform better.

[1]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[2]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[3]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[4]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[5]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[6]  Kyoji Kawagoe,et al.  Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation , 2006 .

[7]  Jason Lines,et al.  Ensembles of Elastic Distance Measures for Time Series Classification , 2014, SDM.

[8]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[9]  Minyoung Kim,et al.  Probabilistic Sequence Translation-Alignment Model for Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[10]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[11]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[12]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[13]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[14]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.