A multilevel distance-based index structure for multivariate time series

Multivariate time series (MTS) datasets are common in various multimedia, medical and financial applications. In previous work, we introduced a similarity measure for MTS datasets, termed Eros (extended Frobenius norm), which is based on the Frobenius norm and principal component analysis (PCA). Eros computes the similarity between two MTS items by measuring how close the corresponding principal components (PCs) are using the eigenvalues as weights. Since the weights are based on the data items in the database, they change whenever data are inserted into or removed from the database. In this paper, we propose a distance-based index structure, Muse (Multilevel distance-based index structure for Eros), for efficient retrieval of MTS items using Eros. Muse constructs each level as a distance-based index structure without using the weights, up to z levels. At the query time, Muse combines the z levels with the weights, which enables the weights to change without the need to rebuild the index structure. In order to show the efficiency of Muse, we performed several experiments on a set of synthetically generated clustered datasets. The results show the superiority of Muse as compared to sequential scan and M-tree in performance.

[1]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[4]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[5]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[6]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[7]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[8]  Cyrus Shahabi,et al.  Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams , 2003, MMM.

[9]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[10]  H. Begleiter,et al.  Event related potentials during object recognition tasks , 1995, Brain Research Bulletin.

[11]  David Chiu,et al.  BOOK REVIEW: "PATTERN CLASSIFICATION", R. O. DUDA, P. E. HART and D. G. STORK, Second Edition , 2001 .

[12]  Cyrus Shahabi,et al.  AIMS: An Immersidata Management System , 2003, CIDR.

[13]  Xiaohui Liu,et al.  Variable grouping in multivariate time series via correlation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[14]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[15]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[16]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[17]  David G. Stork,et al.  Pattern Classification , 1973 .