Speeding up dynamic time warping distance for sparse time series data

Dynamic time warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced nonzero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively large computational cost for long time series. We derive a new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of application include bot detection, human activity classification, search trend analysis, seismic analysis, and unusual review pattern mining.

[1]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[2]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs , 2010, 2010 IEEE International Conference on Data Mining.

[3]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[4]  Eamonn J. Keogh,et al.  A Novel Approximation to Dynamic Time Warping allows Anytime Clustering of Massive Time Series Datasets , 2012, SDM.

[5]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[6]  Ira Assent,et al.  Anticipatory DTW for Efficient Similarity Search in Time Series Databases , 2009, Proc. VLDB Endow..

[7]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[8]  Eamonn J. Keogh,et al.  Detecting time series motifs under uniform scaling , 2007, KDD '07.

[9]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[10]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[11]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[12]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[13]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[14]  Mirko van der Baan,et al.  Automatic approaches for seismic to well tying , 2014 .

[15]  Jun Wang,et al.  On the Non-Trivial Generalization of Dynamic Time Warping to the Multi-Dimensional Case , 2015, SDM.

[16]  Abdullah Mueen,et al.  Enumeration of time series motifs of all lengths , 2013, 2013 IEEE 13th International Conference on Data Mining.

[17]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy , 2015, KDD.

[18]  Abeer Alwan,et al.  Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data. , 2015, The Journal of the Acoustical Society of America.

[19]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[20]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[21]  Kenneth R. Anderson,et al.  Dynamic waveform matching , 1983, Inf. Sci..

[22]  Hossein Hamooni,et al.  DeBot: Twitter Bot Detection via Warped Correlation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[23]  Eamonn J. Keogh,et al.  iSAX: disk-aware mining and indexing of massive time series datasets , 2009, Data Mining and Knowledge Discovery.

[24]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26]  Hossein Hamooni,et al.  Dual-Domain Hierarchical Classification of Phonetic Time Series , 2014, 2014 IEEE International Conference on Data Mining.

[27]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[28]  Gustavo E. A. P. A. Batista,et al.  Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation , 2016, SDM.

[29]  Eamonn J. Keogh,et al.  Discovery of Meaningful Rules in Time Series , 2015, KDD.

[30]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[31]  Diane J. Cook,et al.  CASAS: A Smart Home in a Box , 2013, Computer.

[32]  Dimitrios Hatzinakos,et al.  Gait recognition using dynamic time warping , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[33]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[34]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[35]  K. Selçuk Candan,et al.  sDTW: Computing DTW Distances using Locally Relevant Constraints based on Salient Feature Alignments , 2012, Proc. VLDB Endow..

[36]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[37]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.