论文信息 - Trend similarity and prediction in time-series databases

Trend similarity and prediction in time-series databases

Many algorithms for discovering similar patterns from time- series databases involve three phases: First, sequential data in time domain is transformed into frequency domain using DFT. Then, the first few data points are considered to depict in an R*-tree. Those points in an R*-tree are compared by their distance. Any pair of data points, if the distance between them is within a certain threshold, are found to be similar. This approach results in performance problem due to emphasis on each data point itself. This paper proposes a novel method of finding similar trend patterns, rather than similar data patterns, from time-series database. As opposed to similar data patterns in the frequency domain, a limited number of points, in the time series, that play a dominant role to make a movement direction are taken into account. Those data points are called a trend sequence. Trend sequences will be defined in various ways. Of many, we focus more on considering trend sequences by a data smoothing technique. We know that a trend sequence contains far fewer data points than an original data sequence, but entails abstract level of sequence movements. To some extent, given a trend sequence, we apply the smoothing algorithm to predict the very next trend data. It is likely that once a trend sequence is found, the very next trend data point is expected. This paper also shows a method for trend prediction. We observed that our approach presented in this paper can be applied to finding similarity among many large time-series data sequences to the prediction of next possible data points to follow.

Jong P. Yoon | Sungrim Kim | Jieun Lee

[1] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2] Christos Faloutsos,et al. Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3] Athman Bouguettaya,et al. On-Line Clustering , 1996, IEEE Trans. Knowl. Data Eng..

[4] Peter J. Rousseeuw,et al. Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5] Patrick E. O'Neil,et al. Improved query performance with variant indexes , 1997, SIGMOD '97.

[6] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[7] Christos Faloutsos,et al. Efficient Similarity Search In Sequence Databases , 1993, FODO.

[8] Christos Faloutsos,et al. Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[9] Philip S. Yu,et al. Adaptive query processing for time-series data , 1999, KDD '99.

[10] Ramakrishnan Srikant,et al. Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[11] Ming-Chuan Wu,et al. Query optimization for selections using bitmaps , 1999, SIGMOD '99.

[12] Yannis E. Ioannidis,et al. An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[13] Alberto O. Mendelzon,et al. Similarity-based queries for time series data , 1997, SIGMOD '97.

[14] Haixun Wang,et al. Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[15] Kyuseok Shim,et al. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.