A Novel Trend Symbolic Aggregate Approximation for Time Series

Symbolic Aggregate approximation (SAX) is a classical symbolic approach in many time series data mining applications. However, SAX only reflects the segment mean value feature and misses important information in a segment, namely the trend of the value change in the segment. Such a miss may cause a wrong classification in some cases, since the SAX representation cannot distinguish different time series with similar average values but different trends. In this paper, we present Trend Feature Symbolic Aggregate approximation (TFSAX) to solve this problem. First, we utilize Piecewise Aggregate Approximation (PAA) approach to reduce dimensionality and discretize the mean value of each segment by SAX. Second, extract trend feature in each segment by using trend distance factor and trend shape factor. Then, design multi-resolution symbolic mapping rules to discretize trend information into symbols. We also propose a modified distance measure by integrating the SAX distance with a weighted trend distance. We show that our distance measure has a tighter lower bound to the Euclidean distance than that of the original SAX. The experimental results on diverse time series data sets demonstrate that our proposed representation significantly outperforms the original SAX representation and an improved SAX representation for classification.

[1]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[2]  Tran Khanh Dang,et al.  Two Novel Adaptive Symbolic Representations for Similarity Search in Time Series Databases , 2010, 2010 12th International Asia-Pacific Web Conference.

[3]  Yannis Manolopoulos,et al.  Continuous Trend-Based Classification of Streaming Time Series , 2005, ADBIS.

[4]  Liping Zhang,et al.  TSX: A Novel Symbolic Representation for Financial Time Series , 2012, PRICAI.

[5]  Shuqiang Yang,et al.  Symbolic representation based on trend features for biomedical data classification. , 2015, Technology and health care : official journal of the European Society for Engineering and Medicine.

[6]  Bernard Hugueney,et al.  Adaptive Segmentation-Based Symbolic Representations of Time Series for Better Modeling and Lower Bounding Distance Measures , 2006, PKDD.

[7]  Peter Ljubi,et al.  TIME-SERIES ANALYSIS OF UK TRAFFIC ACCIDENT DATA , 2002 .

[8]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[9]  Yannis Manolopoulos,et al.  Continuous Trend-Based Clustering in Data Streams , 2008, DaWaK.

[10]  Mohamed Medhat Gaber,et al.  RA-SAX: Resource-Aware Symbolic Aggregate Approximation for Mobile ECG Analysis , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[11]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[12]  Jiuyong Li,et al.  An improvement of symbolic aggregate approximation distance measure for time series , 2014, Neurocomputing.

[13]  Qi-Lun Zheng,et al.  Pattern distance of time series based on segmentation by important points , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[14]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[15]  Nuno Horta,et al.  A new SAX-GA methodology applied to investment strategies optimization , 2012, GECCO '12.

[16]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[17]  Zhong Qing The Symbolic Algorithm for Time Series Data Based on Statistic Feature , 2008 .

[18]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[19]  Kyoji Kawagoe,et al.  New Time Series Data Representation ESAX for Financial Applications , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[20]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[21]  Li Hai Symbolic Aggregate Approximation Based on Shape Features , 2011 .

[22]  FuTak-chung A review on time series data mining , 2011 .

[23]  Hayato Yamana,et al.  An improved symbolic aggregate approximation distance measure based on its statistical features , 2016, iiWAS.