TrSAX-An improved time series symbolic representation for classification.

As a major symbolic representation method that has been widely used in time series data mining, Symbolic Aggregate approXimation (SAX) uses the mean value of a segment as the symbol. However, the SAX representation ignores the trend of the value change in the segment, which may cause incorrect classification in some cases, because it cannot distinguish different time series with different trends but the same average value symbol. In this paper, we propose an improved symbolic representation by integrating SAX with the least squares method to describe the time series' mean value and trend information. By comparing the classifiers using the original SAX, two improved SAX representations and another two classifiers that are highly representative and competitive for short time series classification, the results show that the error rate of the classifier that uses our representation is lower than that of those five classifiers on their own in most datasets.

[1]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[2]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[3]  Patrick Schäfer,et al.  Towards Time Series Classification without Human Preprocessing , 2014, MLDM.

[4]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[5]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[6]  Jiuyong Li,et al.  An improvement of symbolic aggregate approximation distance measure for time series , 2014, Neurocomputing.

[7]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[8]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[9]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[10]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[11]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[12]  Zhong Qing The Symbolic Algorithm for Time Series Data Based on Statistic Feature , 2008 .

[13]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[14]  Jason Lines,et al.  An Experimental Evaluation of Nearest Neighbour Time Series Classification , 2014, ArXiv.

[15]  Patrick Schäfer The BOSS is concerned with time series classification in the presence of noise , 2014, Data Mining and Knowledge Discovery.

[16]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[17]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[18]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[19]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.