Repeating patterns as symbols for long time series representation

A symbolic time series representation using repeating shapes as symbols is proposed.Lower bounding similarity measure properties evaluated on URC datasets.Electricity consumption dataset used to evaluate on very long, seasonal time series.Reconstruction error and dimensionality reduction ability comparable to PAA. Over the past years, many representations for time series were proposed with the main purpose of dimensionality reduction and as a support for various algorithms in the domain of time series data processing. However, most of the transformation algorithms are not directly applicable on streams of data but only on static collections of the data as they are iterative in their nature. In this work we propose a symbolic representation of time series along with a method for transformation of time series data into the proposed representation. As one of the basic requirements for applicable representation is the distance measure which would accurately reflect the true shape of the data, we propose a distance measure operating on the proposed representation and lower bounding the Euclidean distance on the original data. We evaluate properties of the proposed representation and the distance measure on the UCR collection of datasets. As we focus on stream data processing, we evaluate the properties and limitations of the proposed representation on very long time series from the domain of electricity consumption monitoring, simulating the processing of potentially unbound data stream.

[1]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[2]  Claudia Niederée,et al.  Forgetful Digital Memory: Towards Brain-Inspired Long-Term Data and Information Management , 2015, SGMD.

[3]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[4]  Mária Bieliková,et al.  Symbolic Time Series Representation for Stream Data Processing , 2015, TrustCom 2015.

[5]  Jason Chen Useful Clustering Outcomes from Meaningful Time Series Clustering , 2007, AusDM.

[6]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[7]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[8]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[9]  Shengfa Miao,et al.  Predefined pattern detection in large time series , 2016, Inf. Sci..

[10]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[11]  Peter Laurinec,et al.  Application of Biologically Inspired Methods to Improve Adaptive Ensemble Learning , 2015, NaBIC.

[12]  George C. Runger,et al.  Learning a symbolic representation for multivariate time series classification , 2015, Data Mining and Knowledge Discovery.

[13]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[14]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[15]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[16]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[17]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[18]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[19]  Tak-Chung Fu,et al.  Preventing Meaningless Stock Time Series Pattern Discovery by Changing Perceptually Important Point Detection , 2005, FSKD.

[20]  Gareth J. Janacek,et al.  A Bit Level Representation for Time Series Data Mining with Shape Based Similarity , 2006, Data Mining and Knowledge Discovery.