Querying and mining of time series data: experimental comparison of representations and distance measures

The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.

[1]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[2]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[3]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[5]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[6]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[9]  Jignesh M. Patel,et al.  An efficient and accurate method for evaluating time series similarity , 2007, SIGMOD '07.

[10]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[11]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[12]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  Eamonn J. Keogh A decade of progress in indexing and mining large time series databases , 2006, VLDB.

[15]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[16]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[17]  Yannis Theodoridis,et al.  Index-based Most Similar Trajectory Search , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[19]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[20]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[21]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[22]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[23]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[24]  Kyoji Kawagoe,et al.  A similarity search method of time series data with combination of Fourier and wavelet transforms , 2002, Proceedings Ninth International Symposium on Temporal Representation and Reasoning.

[25]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[26]  Lei Chen,et al.  Using Multi-Scale Histograms to Answer Pattern Existence and Shape Match Queries , 2005, SSDBM.

[27]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[28]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[29]  Anthony K. H. Tung,et al.  SpADe: On Shape-based Pattern Detection in Streaming Time Series , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[31]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[32]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[33]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[34]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[35]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[36]  Yannis Manolopoulos,et al.  Evaluation of similarity searching methods for music data in P2P networks , 2005, Int. J. Bus. Intell. Data Min..

[37]  Dimitrios Gunopulos,et al.  Indexing Multidimensional Time-Series , 2004, The VLDB Journal.

[38]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[39]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.