Approximate embedding-based subsequence matching of time series

A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to brute-force search, with very small losses (< 1%) in retrieval accuracy.

[1]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[2]  Yang-Sae Moon,et al.  Duality-based subsequence matching in time-series databases , 2001, Proceedings 17th International Conference on Data Engineering.

[3]  Klemens Böhm,et al.  Trading Quality for Time with Nearest Neighbor Search , 2000, EDBT.

[4]  Kenneth Rose,et al.  VQ-index: an index structure for similarity searching in multimedia databases , 2002, MULTIMEDIA '02.

[5]  H. Gabriela,et al.  Cluster-preserving Embedding of Proteins , 1999 .

[6]  Nikos Mamoulis,et al.  Fast and Exact Warping of Time Series Using Adaptive Segmental Approximations , 2005, Machine Learning.

[7]  Anthony K. H. Tung,et al.  LDC: enabling search by partial distance in a hyper-dimensional space , 2004, Proceedings. 20th International Conference on Data Engineering.

[8]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[9]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[10]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[11]  Haifeng Jiang,et al.  Ranked Subsequence Matching in Time-Series Databases , 2007, VLDB.

[12]  Ramesh C. Jain,et al.  Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[13]  Ryuichi Oka Spotting Method for Classification of Real World Data , 1998, Comput. J..

[14]  Peter Morguet,et al.  Spotting dynamic hand gestures in video image sequences using hidden Markov models , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[15]  Yang-Sae Moon,et al.  General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.

[16]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[17]  Kaizhong Zhang,et al.  An Index Structure for Data Mining and Clustering , 2000, Knowledge and Information Systems.

[18]  Christos Faloutsos,et al.  Stream Monitoring under the Time Warping Distance , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[20]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Hanan Samet,et al.  Properties of Embedding Methods for Similarity Searching in Metric Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Hakan Ferhatosmanoglu,et al.  Dynamic Dimensionality Reduction and Similarity Distance Computation by Inner Product Approximations , 1999, CIKM 1999.

[23]  Stan Sclaroff,et al.  Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning , 2005, ICCV-HCI.

[24]  George Kollios,et al.  Query-sensitive embeddings , 2007, ACM Trans. Database Syst..

[25]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[26]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[27]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[28]  Dimitrios Gunopulos,et al.  Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[29]  Tassos Argyros,et al.  Efficient subsequence matching in time series databases under time and amplitude transformations , 2003, Third IEEE International Conference on Data Mining.

[30]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[31]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[32]  Ömer Egecioglu,et al.  Dimensionality reduction and similarity computation by inner-product approximations , 2000, IEEE Transactions on Knowledge and Data Engineering.

[33]  Edward Y. Chang,et al.  Clustering for Approximate Similarity Search in High-Dimensional Spaces , 2002, IEEE Trans. Knowl. Data Eng..

[34]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[35]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[36]  Tamer Kahveci,et al.  Reference-based indexing of sequence databases , 2006, VLDB.

[37]  Steve B. Jiang,et al.  Subsequence matching on structured time series data , 2005, SIGMOD '05.

[38]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[39]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[40]  Wesley W. Chu,et al.  Segment-based approach for subsequence searches in sequence databases , 2001, Comput. Syst. Sci. Eng..

[41]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[42]  Wesley W. Chu,et al.  Similarity search of time-warped subsequences via a suffix tree , 2003, Inf. Syst..

[43]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.