论文信息 - Similarity search for multidimensional data sequences

Similarity search for multidimensional data sequences

Time series data, which are a series of one dimensional real numbers, have been studied in various database applications. We extend the traditional similarity search methods on time series data to support a multidimensional data sequence, such as a video stream. We investigate the problem of retrieving similar multidimensional data sequences from a large database. To prune irrelevant sequences in a database, we introduce correct and efficient similarity functions. Both data sequences and query sequences are partitioned into subsequences, and each of them is represented by a Minimum Bounding Rectangle (MBR). The query processing is based upon these MBRs, instead of scanning data elements of entire sequences. Our method is designed: (1) to select candidate sequences in a database, and (2) to find the subsequences of a selected sequence, each of which falls under the given threshold. The latter is of special importance in the case of retrieving subsequences from large and complex sequences such as video. By using it, we do not need to browse the whole of the selected video stream, but just browse the sub-streams to find a scene we want. We have performed an extensive experiment on synthetic, as well as real data sequences (a collection of TV news, dramas, and documentary videos) to evaluate our proposed method. The experiment demonstrates that 73-94 percent of irrelevant sequences are pruned using the proposed method, resulting in 16-28 times faster response time compared with that of the sequential search.

[1] Hans-Peter Kriegel,et al. The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[2] Dragutin Petkovic,et al. Query by Image and Video Content: The QBIC System , 1995, Computer.

[3] Christos Faloutsos,et al. Efficient Similarity Search In Sequence Databases , 1993, FODO.

[4] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[5] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[6] Dina Q. Goldin,et al. On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[7] Christian Böhm,et al. Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8] Davood Rafiei,et al. On similarity-based queries for time series data , 1997, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9] Hans-Peter Kriegel,et al. The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[10] Nick Roussopoulos,et al. Nearest neighbor queries , 1995, SIGMOD '95.

[11] Alberto O. Mendelzon,et al. Similarity-based queries for time series data , 1997, SIGMOD '97.

[12] Clu-istos Foutsos,et al. Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[13] H. V. Jagadzsh. Linear Clustering of Objects with Multiple Attributes , 1998 .

[14] Christos Faloutsos,et al. Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[15] A. Guttman,et al. A Dynamic Index Structure for Spatial Searching , 1984, SIGMOD 1984.

[16] Shin'ichi Satoh,et al. The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.