Eecient Similarity Search in Sequence Databases

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the rst few frequencies are strong. Another important observation is Parseval's theorem, which speci es that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the rst few Fourier coe cients, we use R -trees to index the sequences and e ciently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coe cients (1-3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

[1]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[2]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[3]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[6]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[7]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[8]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[9]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[10]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[11]  Michael D. Soo,et al.  Bibliography on temporal databases , 1991, SGMD.

[12]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[13]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[14]  Dennis Shasha,et al.  New techniques for best-match retrieval , 1990, TOIS.

[15]  H. V. Jagadish Spatial search with polyhedra , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[16]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[17]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[18]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[19]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  Kenneth Steiglitz,et al.  Operations on Images Using Quad Trees , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Alan V. Oppenheim,et al.  Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[24]  Arthur Gelb,et al.  Applied Optimal Estimation , 1974 .

[25]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[26]  R. Edwards,et al.  Technical Analysis of Stock Trends , 1966 .

[27]  Journal of Molecular Biology , 1959, Nature.

[28]  Darshana Mistry,et al.  Survey of Image Registration techniques for Satellite Images , 2022 .