Discovery of Important Subsequences in Electrocardiogram Beats Using the Nearest Neighbour Algorithm

The classification of time series data is a well-studied problem with numerous practical applications, such as medical diagnosis and speech recognition. A popular and effective approach is to classify new time series in the same way as their nearest neighbours, whereby proximity is defined using Dynamic Time Warping (DTW) distance, a measure analogous to sequence alignment in bioinformatics. However, practitioners are not only interested in accurate classification, they are also interested in why a time series is classified a certain way. To this end, we introduce here the problem of finding a minimum length subsequence of a time series, the removal of which changes the outcome of the classification under the nearest neighbour algorithm with DTW distance. Informally, such a subsequence is expected to be relevant for the classification and can be helpful for practitioners in interpreting the outcome. We describe a simple but optimized implementation for detecting these subsequences and define an accompanying measure to quantify the relevance of every time point in the time series for the classification. In tests on electrocardiogram data we show that the algorithm allows discovery of important subsequences and can be helpful in detecting abnormalities in cardiac rhythms distinguishing sick from healthy patients.

[1]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[2]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[4]  Kamalesh Kumar Sharma,et al.  QRS complex detection in ECG signals using locally adaptive weighted total variation denoising , 2017, Comput. Biol. Medicine.

[5]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[6]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  Matteo Terzi,et al.  Time-Series Classification Methods: Review and Applications to Power Systems Data , 2018 .

[9]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[10]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[11]  Willis J. Tompkins,et al.  A Real-Time QRS Detection Algorithm , 1985, IEEE Transactions on Biomedical Engineering.

[12]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[13]  William P. Marnane,et al.  Exploring temporal information in neonatal seizures using a dynamic time warping based SVM kernel , 2017, Comput. Biol. Medicine.

[14]  Eamonn J. Keogh Instance-Based Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[15]  Peter Stagge,et al.  Recurrent neural networks for time series classification , 2003, Neurocomputing.

[16]  Juan José Rodríguez Diez,et al.  Support vector machines of interval-based features for time series classification , 2004, Knowl. Based Syst..

[17]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[18]  R Frank,et al.  Right Ventricular Dysplasia: A Report of 24 Adult Cases , 1982, Circulation.

[19]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[20]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[21]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[22]  P. C. Cortez,et al.  New approach for T-wave peak detection and T-wave end location in 12-lead paced ECG signals based on a mathematical model. , 2013, Medical engineering & physics.

[23]  Gustavo E. A. P. A. Batista,et al.  Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation , 2016, SDM.

[24]  K. Clarkson Nearest-Neighbor Searching and Metric Space Dimensions , 2005 .

[25]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[26]  Jason Lines,et al.  Time series classification with ensembles of elastic distance measures , 2015, Data Mining and Knowledge Discovery.

[27]  Nuno Constantino Castro,et al.  Time Series Data Mining , 2009, Encyclopedia of Database Systems.

[28]  Alan Bundy,et al.  Dynamic Time Warping , 1984 .

[29]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[30]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[31]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[32]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..