Eecient Similarity Search in Sequence Databases

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the rst few frequencies are strong. Another important observation is Parseval's theorem, which speci es that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the rst few Fourier coe cients, we use R -trees to index the sequences and e ciently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coe cients (1-3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

[1]  Journal of Molecular Biology , 1959, Nature.

[2]  R. Edwards,et al.  Technical Analysis of Stock Trends , 1966 .

[3]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[4]  Arthur Gelb,et al.  Applied Optimal Estimation , 1974 .

[5]  B. Mandelbrot Fractal Geometry of Nature , 1984 .

[6]  Alan V. Oppenheim,et al.  Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Kenneth Steiglitz,et al.  Operations on Images Using Quad Trees , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[10]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[11]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[12]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[13]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[14]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[15]  Dennis Shasha,et al.  New techniques for best-match retrieval , 1990, TOIS.

[16]  H. V. Jagadish Spatial search with polyhedra , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[17]  Michael D. Soo,et al.  Bibliography on temporal databases , 1991, SGMD.

[18]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[19]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[20]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[21]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[22]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[23]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[26]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[27]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[28]  Darshana Mistry,et al.  Survey of Image Registration techniques for Satellite Images , 2022 .