iSAX: indexing and mining terabyte sized time series

Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, the algorithms and the size of data considered have generally not been representative of the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a novel multi-resolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. Our approach allows both fast exact search and ultra fast approximate search. We show how to exploit the combination of both types of search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing millions of time series.

[1]  S. Itoh,et al.  A wavelet transform-based ECG compression method guaranteeing desired signal quality , 1998, IEEE Transactions on Biomedical Engineering.

[2]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[3]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[4]  L. Batista,et al.  Compression of ECG signals by optimized quantization of discrete cosine transform coefficients. , 2001, Medical engineering & physics.

[5]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[6]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[7]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[8]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[9]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[10]  S. Scholle,et al.  Atlas of States of Sleep and Wakefulness in Infants and Children , 1999 .

[11]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[12]  Qiang Wang,et al.  A multiresolution symbolic representation of time series , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[14]  Ira Assent,et al.  The TS-tree: efficient time series search and retrieval , 2008, EDBT '08.

[15]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.