General Hierarchical Model (GHM) to measure similarity of time series

Similarity query is a frequent subroutine in time series database to find the similar time series of the given one. In this process, similarity measure plays a very important part. The previous methods do not consider the relation between point correspondences and the importance (role) of the points on the content of time series during measuring similarity, resulting in their low accuracies in many real applications. In the paper, we propose a General Hierarchical Model (GHM), which determines the point correspondences by the hierarchies of points. It partitions the points of time series into different hierarchies, and then the points are restricted to be compared with the ones in the same hierarchy. The practical methods can be implemented based on the model with any real requirements, e.g. FFT Hierarchical Measures (FHM) given in this paper. And the hierarchical filtering methods of GHM are provided for range and k-NN queries respectively. Finally, two common data sets were used in k-NN query and clustering experiments to test the effectiveness of our approach and others. The time performance comparisons of all the tested methods were performed using the synthetic data set with various sizes. The experimental results show the superiority of our approach over the competitors. And we also give the experimental powers of the filtering methods proposed in the queries.

[1]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[2]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[3]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[4]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[5]  Dragomir Anguelov,et al.  Mining The Stock Market : Which Measure Is Best ? , 2000 .

[6]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[7]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[8]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[9]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[10]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[11]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[12]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[13]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[14]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[15]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[16]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[17]  Qiang Wang,et al.  A multiresolution symbolic representation of time series , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[19]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .