An efficient and accurate method for evaluating time series similarity

A variety of techniques currently exist for measuring the similarity between time series datasets. Of these techniques, the methods whose matching criteria is bounded by a specified ε threshold value, such as the LCSS and the EDR techniques, have been shown to be robust in the presence of noise, time shifts, and data scaling. Our work proposes a new algorithm, called the Fast Time Series Evaluation (FTSE) method, which can be used to evaluate such threshold value techniques, including LCSS and EDR. Using FTSE, we show that these techniques can be evaluated faster than using either traditional dynamic programming or even warp-restricting methods such as the Sakoe-Chiba band and the Itakura Parallelogram. We also show that FTSE can be used in a framework that can evaluate a richer range of ε threshold-based scoring techniques, of which EDR and LCSS are just two examples. This framework, called Swale, extends the ε threshold-based scoring techniques to include arbitrary match rewards and gap penalties. Through extensive empirical evaluation, we show that Swale can obtain greater accuracy than existing methods.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[5]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[6]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[7]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[8]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[9]  Jiong Yang,et al.  CLUSEQ: efficient and effective sequence clustering , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  G. R. Cross,et al.  An improved algorithm to find the length of the longest common subsequence of two strings , 1989, SIGF.

[11]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[12]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[13]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[14]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[15]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[16]  Dimitrios Gunopulos,et al.  Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[17]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[18]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[19]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[20]  Eamonn J. Keogh,et al.  Scaling up Dynamic Time Warping to Massive Dataset , 1999, PKDD.

[21]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[22]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[23]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[24]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[25]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[27]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[29]  Margrit Betke,et al.  THE CAMERA MOUSE: PRELIMINARY INVESTIGATION OF AUTOMATED VISUAL TRACKING FOR COMPUTER ACCESS , 2000 .

[30]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[31]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.