A PCA-based similarity measure for multivariate time series

Multivariate time series (MTS) datasets are common in various multimedia, medical and financial applications. We propose a similarity measure for MTS datasets, <i>Eros</i> <i>E</i>xtended F<i>ro</i>beniu<i>s</i> norm), which is based on Principal Component Analysis (PCA). <i>Eros</i> applies PCA to MTS datasets represented as matrices to generate principal components and associated eigenvalues. These principal components and eigenvalues are then used to compare the similarity between MTS matrices. Though <i>Eros</i> in itself does not satisfy the triangle inequality, without which existing multidimensional indexing structures may not be utilized, the lower and upper bounds to satisfy the triangle inequality are obtained. In order to show the validity of <i>Eros</i> for similarity search on MTS datasets, we performed several experiments on three datasets (2 real-world and 1 synthetic). The results show the superiority of our approaches as compared to the traditional similarity measures for MTS datasets, such as Euclidean Distance (ED), Dynamic Time Warping (DTW), Weighted Sum SVD (WSSVD) and PCA similarity factor (S<sc>PCA</sc>) in precision/recall.

[1]  Aaron F. Bobick,et al.  Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[2]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[3]  Mario A. Nascimento,et al.  A compact and efficient image retrieval approach based on border/interior pixel classification , 2002, CIKM '02.

[4]  Cyrus Shahabi,et al.  AIMS: An Immersidata Management System , 2003, CIDR.

[5]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[6]  Tim Oates,et al.  Identifying distinctive subsequences in multivariate time series by clustering , 1999, KDD '99.

[7]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[8]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[9]  Deok-Hwan Kim,et al.  QCluster: relevance feedback using adaptive clustering for content-based image retrieval , 2003, SIGMOD '03.

[10]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[11]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[12]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[13]  David Chiu,et al.  BOOK REVIEW: "PATTERN CLASSIFICATION", R. O. DUDA, P. E. HART and D. G. STORK, Second Edition , 2001 .

[14]  A. Corradini,et al.  Dynamic time warping for off-line recognition of a small gesture vocabulary , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[15]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[16]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[18]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[19]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[20]  R. Manmatha,et al.  Lower-Bounding of Dynamic Time Warping Distances for Multivariate Time Series , 2003 .

[21]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[22]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[23]  Davide Roverso,et al.  Plant diagnostics by transient classification: The ALADDIN approach , 2002, Int. J. Intell. Syst..

[24]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[25]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[26]  Cyrus Shahabi,et al.  Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams , 2003, MMM.

[27]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[28]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[29]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[30]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[31]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[32]  Tanveer F. Syeda-Mahmood,et al.  View-invariant alignment and matching of video sequences , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  D. Seborg,et al.  Clustering multivariate time‐series data , 2005 .

[34]  Ryoji Kataoka,et al.  Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation , 2001, VLDB.

[35]  David G. Stork,et al.  Pattern Classification , 1973 .

[36]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[37]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[38]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[39]  Xiaohui Liu,et al.  Variable grouping in multivariate time series via correlation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[40]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[41]  H. Begleiter,et al.  Event related potentials during object recognition tasks , 1995, Brain Research Bulletin.