Fast Time Sequence Indexing for Arbitrary Lp Norms

Fast indexing in time sequence databases for similarity searching has attracted a lot of research recently. Most of the proposals, however, typically centered around the Euclidean distance and its derivatives. We examine the problem of multimodal similarity search in which users can choose the best one from multiple similarity models for their needs. In this paper, we present a novel and fast indexing scheme for time sequences, when the distance function is any of arbitrary Lp norms (p = 1; 2; : : : ;1). One feature of the proposed method is that only one index structure is needed for all Lp norms including the popular Euclidean distance (L2 norm). Our scheme achieves significant speedups over the state of the art: extensive experiments on real and synthetic time sequences show that the proposed method is comparable to the best competitor forL2 andL1 norms, but significantly (up to 10 times) faster for L1 norm.

[1]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[2]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[5]  R. Ng,et al.  Eecient and Eeective Clustering Methods for Spatial Data Mining , 1994 .

[6]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[7]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[9]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[10]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[11]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[12]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[13]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[14]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[15]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  S. Berberian A first course in real analysis , 1994 .

[18]  Yadolah Dodge,et al.  L[1]-statistical procedures and related topics , 1997 .

[19]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[20]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[21]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[22]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[23]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[24]  J. Davenport Editor , 1960 .

[25]  Nikos D. Sidiropoulos,et al.  Mathematical programming algorithms for regression-based nonlinear filtering in RN , 1999, IEEE Trans. Signal Process..

[26]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[27]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[28]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[29]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[30]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[31]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[32]  Christos Faloutsos,et al.  A signature technique for similarity-based queries , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[33]  VitterJeffrey Scott,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999 .

[34]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[35]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[36]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[37]  C. Faloutsos Eecient Similarity Search in Sequence Databases , 1993 .

[38]  Philip S. Yu,et al.  HierarchyScan: a hierarchical similarity search algorithm for databases of long sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.