Symbolic Representation of Time Series: A Hierarchical Coclustering Formalization

The choice of an appropriate representation remains crucial for mining time series, particularly to reach a good trade-off between the dimensionality reduction and the stored information. Symbolic representations constitute a simple way of reducing the dimensionality by turning time series into sequences of symbols. SAXO is a data-driven symbolic representation of time series which encodes typical distributions of data points. This approach was first introduced as a heuristic algorithm based on a regularized coclustering approach. The main contribution of this article is to formalize SAXO as a hierarchical coclustering approach. The search for the best symbolic representation given the data is turned into a model selection problem. Comparative experiments demonstrate the benefit of the new formalization, which results in representations that drastically improve the compression of data.

[1]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[2]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[3]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[4]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[5]  Eamonn J. Keogh,et al.  A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering , 2005, PAKDD.

[6]  Anne Rogers,et al.  Hancock: a language for extracting signatures from data streams , 2000, KDD '00.

[7]  J. O. Ramsay,et al.  Functional Data Analysis (Springer Series in Statistics) , 1997 .

[8]  D. Bosq Linear Processes in Function Spaces: Theory And Applications , 2000 .

[9]  Li Wei,et al.  Intelligent Icons: Integrating Lite-Weight Data Mining and Visualization into GUI Operating Systems , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[11]  Chonghui Guo,et al.  An Improved Piecewise Aggregate Approximation Based on Statistical Features for Time Series Mining , 2010, KSEM.

[12]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[14]  Hugo Jair Escalante,et al.  Hands on Pattern Recognition , 2011 .

[15]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[16]  Eamonn J. Keogh,et al.  iSAX 2.0: Indexing and Mining One Billion Time Series , 2010, 2010 IEEE International Conference on Data Mining.

[17]  Marc Boullé,et al.  Functional data clustering via piecewise constant nonparametric density estimation , 2012, Pattern Recognit..

[18]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[19]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[20]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[21]  Marc Boullé,et al.  Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach , 2009, Adv. Data Anal. Classif..

[22]  Marc Boullé,et al.  SAXO: An optimized data-driven symbolic representation of time series , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).