A non-parametric symbolic approximate representation for long time series

For long time series, it is crucial to design low-dimensional representations that preserve the fundamental characteristics of a series. However, most of the approximate representations require the setting of many input parameters. The main defect of working with parameter-laden algorithms is that incorrect settings may cause an algorithm to fail in achieving the best performance, which is the ability of reducing the dimensionality and retaining the shape information. This is especially likely when the selection of the suitable parameter is not trivial or easy for the user. In this paper, we introduce a new approximate representation of time series, the non-parametric symbolic approximate representation (NSAR), which is based on multi-scale, the approximate coefficients of discrete wavelet transform (DWT) and key points. The novelty of the proposed representation is firstly that it uses a hierarchical mechanism to retain shape information of the original time series. Next, the proposed representation is symbolic in employing key points and encoding in approximate coefficients, so it can greatly reduce the dimension of the original time series and potentially allows the application of text-based retrieval techniques. The proposed representation is fast, automatic, and with no parameter tuning by user. To show the efficacy of the new representation, we performed experiments with real and synthetic data. Experimental results show that NSAR can preserve more fundamental characteristics of a series than symbolic approximate representation (SAX) in the same compression ratio, automatically determine the optimal decomposition level for DWT, and has better performance than SAX in the best matching queries.

[1]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[2]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[3]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[4]  A L Goldberger,et al.  Physiological time-series analysis: what does regularity quantify? , 1994, The American journal of physiology.

[5]  Rani G. Selvanathan A Dynamic Programming Approach to Sustainability , 2015 .

[6]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[7]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[8]  Eamonn J. Keogh,et al.  iSAX: disk-aware mining and indexing of massive time series datasets , 2009, Data Mining and Knowledge Discovery.

[9]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  H. Stanley,et al.  Time-dependent Hurst exponent in financial time series , 2004 .

[11]  George Karabatis,et al.  Discrete wavelet transform-based time series analysis and mining , 2011, CSUR.

[12]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[13]  Zehong Yang,et al.  Intelligent stock trading system by turning point confirming and probabilistic reasoning , 2008, Expert Syst. Appl..

[14]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[16]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[17]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[18]  Roberto Hornero,et al.  Interpretation of the Lempel-Ziv Complexity Measure in the Context of Biomedical Signal Analysis , 2006, IEEE Transactions on Biomedical Engineering.

[19]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[20]  Michel Verleysen,et al.  Vector quantization: a weighted version for time-series forecasting , 2005, Future Gener. Comput. Syst..

[21]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[22]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[23]  Tak-Chung Fu,et al.  Evolutionary time series segmentation for stock data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  Eamonn J. Keogh,et al.  LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures , 2006, VLDB.

[25]  Lei Chen,et al.  Using Multi-Scale Histograms to Answer Pattern Existence and Shape Match Queries , 2005, SSDBM.

[26]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[27]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[28]  Eugene Fink,et al.  Search for Patterns in Compressed Time Series , 2002, Int. J. Image Graph..

[29]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[30]  Kennel,et al.  Symbolic approach for measuring temporal "irreversibility" , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[31]  Eugene Fink,et al.  Compression of time series by extracting major extrema , 2011, J. Exp. Theor. Artif. Intell..

[32]  Eamonn J. Keogh,et al.  A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering , 2005, PAKDD.

[33]  Qiang Wang,et al.  A multiresolution symbolic representation of time series , 2005, 21st International Conference on Data Engineering (ICDE'05).

[34]  Eugene Fink,et al.  Indexing of time series by major minima and maxima , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[35]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[36]  Edwin Lughofer,et al.  Extensions of vector quantization for incremental clustering , 2008, Pattern Recognit..

[37]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[38]  C. Chatfield,et al.  Fourier Analysis of Time Series: An Introduction , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[40]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[41]  KolliSandeep,et al.  A dynamic programming approach , 2011 .

[42]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[43]  Nikola Kasabov,et al.  Dynamic Learning of Multiple Time Series in a Nonstationary Environment , 2012 .

[44]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[45]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[46]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[47]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[48]  Gareth J. Janacek,et al.  Clustering Time Series with Clipped Data , 2005, Machine Learning.

[49]  Nicolás Marín,et al.  A Fuzzy Approach to the Linguistic Summarization of Time Series , 2011, J. Multiple Valued Log. Soft Comput..