论文信息 - Physical Database Design for Efficient Time-Series Similarity Search

Physical Database Design for Efficient Time-Series Similarity Search

Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.

[1] Christos Faloutsos,et al. Efficient Similarity Search In Sequence Databases , 1993, FODO.

[2] Hans-Peter Kriegel,et al. The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[3] Christos Faloutsos,et al. Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[4] Wesley W. Chu,et al. Efficient processing of similarity search under time warping in sequence databases: an index-based approach , 2004, Inf. Syst..

[5] Alan V. Oppenheim,et al. Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[6] Sang-Wook Kim,et al. A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases , 2004, Data Mining and Knowledge Discovery.

[7] Christos Faloutsos,et al. Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[8] Sang-Wook Kim,et al. Using multiple indexes for efficient subsequence matching in time-series databases , 2006, Inf. Sci..

[9] Philip S. Yu,et al. Effective nearest neighbor indexing with the euclidean metric , 2001, CIKM '01.

[10] Yang-Sae Moon,et al. General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.