Season- and Trend-aware Symbolic Approximation for Accurate and Efficient Time Series Matching

Processing and analyzing time series datasets have become a central issue in many domains requiring data management systems to support time series as a native data type. A core access primitive of time series is matching, which requires efficient algorithms on-top of appropriate representations like the symbolic aggregate approximation (SAX) representing the current state of the art. This technique reduces a time series to a low-dimensional space by segmenting it and discretizing each segment into a small symbolic alphabet. Unfortunately, SAX ignores the deterministic behavior of time series such as cyclical repeating patterns or a trend component affecting all segments, which may lead to a sub-optimal representation accuracy. We therefore introduce a novel season- and a trend-aware symbolic approximation and demonstrate an improved representation accuracy without increasing the memory footprint. Most importantly, our techniques also enable a more efficient time series matching by providing a match up to three orders of magnitude faster than SAX.

[1]  Romain Tavenard,et al.  1d-SAX: A Novel Symbolic Representation for Time Series , 2013, IDA.

[2]  Elke A. Rundensteiner,et al.  TARDIS: Distributed Indexing Framework for Big Time Series Data , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[3]  Evangelos Spiliotis,et al.  The M4 Competition: Results, findings, conclusion and way forward , 2018, International Journal of Forecasting.

[4]  Themis Palpanas,et al.  Data Series Management: Fulfilling the Need for Big Sequence Analytics , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[5]  Hayato Yamana,et al.  An improved symbolic aggregate approximation distance measure based on its statistical features , 2016, iiWAS.

[6]  Shuqiang Yang,et al.  Symbolic representation based on trend features for biomedical data classification. , 2015, Technology and health care : official journal of the European Society for Engineering and Medicine.

[7]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[8]  Dimitar Kazakov,et al.  SAX Discretization Does Not Guarantee Equiprobable Symbols , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Kyoji Kawagoe,et al.  New Time Series Data Representation ESAX for Financial Applications , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[10]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[11]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[12]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.