Trend similarity and prediction in time-series databases

Many algorithms for discovering similar patterns from time- series databases involve three phases: First, sequential data in time domain is transformed into frequency domain using DFT. Then, the first few data points are considered to depict in an R*-tree. Those points in an R*-tree are compared by their distance. Any pair of data points, if the distance between them is within a certain threshold, are found to be similar. This approach results in performance problem due to emphasis on each data point itself. This paper proposes a novel method of finding similar trend patterns, rather than similar data patterns, from time-series database. As opposed to similar data patterns in the frequency domain, a limited number of points, in the time series, that play a dominant role to make a movement direction are taken into account. Those data points are called a trend sequence. Trend sequences will be defined in various ways. Of many, we focus more on considering trend sequences by a data smoothing technique. We know that a trend sequence contains far fewer data points than an original data sequence, but entails abstract level of sequence movements. To some extent, given a trend sequence, we apply the smoothing algorithm to predict the very next trend data. It is likely that once a trend sequence is found, the very next trend data point is expected. This paper also shows a method for trend prediction. We observed that our approach presented in this paper can be applied to finding similarity among many large time-series data sequences to the prediction of next possible data points to follow.

[1]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Athman Bouguettaya,et al.  On-Line Clustering , 1996, IEEE Trans. Knowl. Data Eng..

[4]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[6]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[7]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[8]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[9]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[10]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[11]  Ming-Chuan Wu,et al.  Query optimization for selections using bitmaps , 1999, SIGMOD '99.

[12]  Yannis E. Ioannidis,et al.  An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[13]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[14]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[15]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.