Evaluating distance measures and times series clustering for temporal patterns retrieval

This paper presents a new method dealing with similarity search and retrieval of temporal motifs from time series data. The suggested approach firstly creates an index over important time series subsequences, using subdimensional clustering. Then, during the querying process, rather than scanning the whole database for extracting relevant answers for a given query, our method traverses the index represented as centroids of the generated clusters, and search for similar subsequences to the query. Finally, relevant temporal associations can be found between the returned motifs using Formal Concept Analysis and Allen's relations.

[1]  L. Kelley,et al.  An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies. , 1996, Protein engineering.

[2]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[4]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[5]  Sidahmed Benabderrahmane,et al.  IntelliGO: a new vector-based semantic similarity measure including annotation origin , 2010, BMC Bioinformatics.

[6]  Qiang Wang,et al.  A multiresolution symbolic representation of time series , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Eamonn J. Keogh,et al.  Atomic wedgie: efficient query filtering for streaming time series , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[9]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[10]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[11]  Romain Tavenard,et al.  1d-SAX: A Novel Symbolic Representation for Time Series , 2013, IDA.

[12]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[13]  Paolo Ciaccia,et al.  Warping the Time on Data Streams , 2005, SBBD.

[14]  Dimitrios Gunopulos,et al.  Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[17]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[18]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[19]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[20]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[21]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[22]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[23]  Marcella Corduas,et al.  Mining Time Series Data: A Selective Survey , 2010 .

[24]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[25]  Qiang Wang,et al.  Time series analysis with multiple resolutions , 2010, Inf. Syst..