Textual Approximation Methods for Time Series Classification: TAX and l-TAX

SUMMARY A lot of work has been conducted on time series classification and similarity search over the past decades. However, the classification of a time series with high accuracy is still insufficient in applications such as ubiquitous or sensor systems. In this paper, a novel textual approximation of a time series, called TAX, is proposed to achieve high accuracy time series classification. l-TAX, an extended version of TAX that shows promising classification accuracy over TAX and other existing methods, is also proposed. We also provide a comprehensive comparison between TAX and l-TAX, and discuss the benefits of both methods. Both TAX and l-TAX transform a time series into a textual structure using existing document retrieval methods and bioinformatics algorithms. In TAX, a time series is represented as a document like structure, whereas l-TAX used a sequence of textual symbols. This paper provides a comprehensive overview of the textual approximation and techniques used by TAX and l-TAX

[1]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[2]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[3]  Qiang Wang,et al.  A dimensionality reduction technique for efficient time series similarity analysis , 2008, Inf. Syst..

[4]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[5]  Hung-Hsuan Huang,et al.  Time Series Classification Method Based on Longest Common Subsequence and Textual Approximation , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[6]  J. Kurths,et al.  Quantitative analysis of heart rate variability. , 1995, Chaos.

[7]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[8]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[9]  Jignesh M. Patel,et al.  An efficient and accurate method for evaluating time series similarity , 2007, SIGMOD '07.

[10]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[11]  Mathias Baumert,et al.  Short- and long-term joint symbolic dynamics of heart rate and blood pressure in dilated cardiomyopathy , 2005, IEEE Transactions on Biomedical Engineering.

[12]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[13]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[14]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[15]  Dah-Jye Lee,et al.  Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[17]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Pierre-François Marteau,et al.  Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  F. Wendling,et al.  Extraction of spatio-temporal signatures from depth EEG seizure signals based on objective matching in warped vectorial observations , 1996, IEEE Transactions on Biomedical Engineering.

[20]  Huey-Wen Yien,et al.  Linguistic analysis of the human heartbeat using frequency and rank order statistics. , 2003, Physical review letters.

[21]  Keun Ho Ryu,et al.  Multivariable stream data classification using motifs and their temporal relations , 2009, Inf. Sci..

[22]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[23]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[24]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[25]  Xiaofang Zhou,et al.  Searching time series using textual approximation , 2011 .

[26]  Smruti R. Sarangi,et al.  DUST: a generalized notion of similarity between uncertain time series , 2010, KDD.

[27]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  Eamonn J. Keogh,et al.  Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases , 2005, Inf. Vis..

[29]  C. Finney,et al.  A review of symbolic analysis of experimental data , 2003 .

[30]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[31]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[32]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[33]  Dimitrios Gunopulos,et al.  Elastic Translation Invariant Matching of Trajectories , 2005, Machine Learning.

[34]  Mark S. Nixon,et al.  Feature Extraction and Image Processing , 2002 .