Using dynamic time warping distances as features for improved time series classification

Dynamic time warping (DTW) has proven itself to be an exceptionally strong distance measure for time series. DTW in combination with one-nearest neighbor, one of the simplest machine learning methods, has been difficult to convincingly outperform on the time series classification task. In this paper, we present a simple technique for time series classification that exploits DTW’s strength on this task. But instead of directly using DTW as a distance measure to find nearest neighbors, the technique uses DTW to create new features which are then given to a standard machine learning method. We experimentally show that our technique improves over one-nearest neighbor DTW on 31 out of 47 UCR time series benchmark datasets. In addition, this method can be easily extended to be used in combination with other methods. In particular, we show that when combined with the symbolic aggregate approximation (SAX) method, it improves over it on 37 out of 47 UCR datasets. Thus the proposed method also provides a mechanism to combine distance-based methods like DTW with feature-based methods like SAX. We also show that combining the proposed classifiers through ensembles further improves the performance on time series classification.

[1]  Chris D. Nugent,et al.  The Application of Neural Networks in the Classification of the Electrocardiogram , 2002 .

[2]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[3]  Zachary Blanks,et al.  Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain , 2017 .

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[6]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[7]  Yannis Manolopoulos,et al.  Feature-based classification of time-series data , 2001 .

[8]  Akira Hayashi,et al.  Embedding Time Series Data for Classification , 2005, MLDM.

[9]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[10]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[11]  Nick S. Jones,et al.  Highly Comparative Feature-Based Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Shashi Shekhar,et al.  Spatiotemporal change footprint pattern discovery: an inter‐disciplinary survey , 2014, WIREs Data Mining Knowl. Discov..

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Juan José Rodríguez Diez,et al.  Interval and dynamic time warping-based decision trees , 2004, SAC '04.

[15]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[16]  George C. Runger,et al.  A Bag-of-Features Framework to Classify Time Series , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tom Armstrong,et al.  Classification of Patients Using Novel Multivariate Time Series Representations of Physiological Data , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[18]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[19]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[20]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[21]  Michael D. Todd,et al.  Automated Feature Design for Numeric Sequence Classification by Genetic Programming , 2015, IEEE Transactions on Evolutionary Computation.

[22]  Eamonn J. Keogh,et al.  DTW-D: time series semi-supervised learning from a single example , 2013, KDD.

[23]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[24]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[25]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[26]  Manfred J. Schmitt Computational Intelligence Processing in Medical Diagnosis , 2005, IEEE Transactions on Neural Networks.

[27]  Petra Perner,et al.  Machine Learning and Data Mining in Pattern Recognition , 2009, Lecture Notes in Computer Science.

[28]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[29]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[30]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[31]  Jason Lines,et al.  Time series classification with ensembles of elastic distance measures , 2015, Data Mining and Knowledge Discovery.

[32]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[33]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[34]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[35]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[36]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[37]  Thomas Philip Runarsson,et al.  Support vector machines and dynamic time warping for time series , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[38]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[39]  Akira Hayashi,et al.  Embedding of time series data by using dynamic time warping distances , 2006 .

[40]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[41]  Andrew J. Vickers,et al.  What is a p-value anyway? 34 Stories to Help You Actually Understand Statistics , 2009 .

[42]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[43]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[44]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[45]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[46]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[47]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[48]  LinesJason,et al.  Classification of time series by shapelet transformation , 2014 .

[49]  Sergey Malinchik,et al.  SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model , 2013, 2013 IEEE 13th International Conference on Data Mining.

[50]  Akira Hayashi,et al.  Embedding of time series data by using dynamic time warping distances , 2006, Systems and Computers in Japan.