Scaling up dynamic time warping for datamining applications

There has been much recent interest in adapting data mining algorithms to time series databases. Most of these algorithms need to compare time series. Typically some variation of Euclidean distance is used. However, as we demonstrate in this paper, Euclidean distance can be an extremely brittle distance measure. Dynamic time warping (DTW) has been suggested as a technique to allow more robust distance calculations, however it is computationally expensive. In this paper we introduce a modification of DTW which operates on a higher level abstraction of the data, in particular, a Piecewise Aggregate Approximation (PAA). Our approach allows us to outperform DTW by one to two orders of magnitude, with no loss of accuracy.

[1]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[2]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[3]  Paul R. Cohen,et al.  Learned models for continuous planning , 1999, AISTATS.

[4]  Douglas H. Fisher,et al.  Supervised classification with temporal data , 1997 .

[5]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[6]  Georges Hébrail,et al.  Interactive Interpretation of Kohonen Maps Applied to Curves , 1998, KDD.

[7]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[8]  Eamonn J. Keogh,et al.  An indexing scheme for fast similarity search in large time series databases , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[9]  Klaus Gollmer,et al.  Detection of distorted pattern using dynamic time warping algorithm and application for supervision , 1995 .

[10]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[11]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[12]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[13]  Mohammed Waleed Kadous,et al.  Learning Comprehensible Descriptions of Multivariate Time Series , 1999, ICML.

[14]  R. Coifman,et al.  Local feature extraction and its applications using a library of bases , 1994 .

[15]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[17]  Larry S. Davis,et al.  Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[18]  E. Caiani,et al.  Warped-average template technique to track on a cycle-by-cycle basis the cardiac filling phases on left ventricular volume , 1998, Computers in Cardiology 1998. Vol. 25 (Cat. No.98CH36292).

[19]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[20]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[21]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.