An Examination of Multivariate Time Series Hashing with Applications to Health Care

As large-scale multivariate time series data become increasingly common in application domains, such as health care and traffic analysis, researchers are challenged to build efficient tools to analyze it and provide useful insights. Similarity search, as a basic operator for many machine learning and data mining algorithms, has been extensively studied before, leading to several efficient solutions. However, similarity search for multivariate time series data is intrinsically challenging because (1) there is no conclusive agreement on what is a good similarity metric for multivariate time series data and (2) calculating similarity scores between two time series is often computationally expensive. In this paper, we address this problem by applying a generalized hashing framework, namely kernelized locality sensitive hashing, to accelerate time series similarity search with a series of representative similarity metrics. Experiment results on three large-scale clinical data sets demonstrate the effectiveness of the proposed approach.

[1]  Jessica Lin,et al.  Pattern Recognition in Time Series , 2012 .

[2]  Arnaud Doucet,et al.  Autoregressive Kernels For Time Series , 2011, 1101.0673.

[3]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[4]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[5]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[6]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[7]  Kristen Grauman,et al.  Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[8]  X. Yao,et al.  Model-based kernel for efficient time series analysis , 2013, KDD.

[9]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[10]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Tetsuji Kuboyama,et al.  A Generalization of Haussler's Convolution Kernel — Mapping Kernel and Its Application to Tree Kernels , 2008, ICML '08.

[12]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[13]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[14]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[15]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[16]  G.B. Moody,et al.  Similarity-based searching in multi-parameter time series databases , 2008, 2008 Computers in Cardiology.

[17]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[18]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[19]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[20]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[21]  Benjamin M. Marlin,et al.  Unsupervised pattern discovery in electronic health care data using probabilistic clustering models , 2012, IHI '12.

[22]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[23]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[24]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[25]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[26]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[27]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[28]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[29]  G. Bonsel,et al.  Assessing the outcome of pediatric intensive care. , 1993, The Journal of pediatrics.

[30]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[31]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[32]  Jimeng Sun,et al.  MatrixFlow: Temporal Network Visual Analytics to Track Symptom Evolution during Disease Progression , 2012, AMIA.