Big-data analytics for Arrhythmia Classification using data compression and kernel methods

Big data analytics is broadly used today in multiple research fields to discover and analyze hidden patterns and other useful information in large databases. Although Cardiac Arrhythmia Classification (CAC) has been studied in depth to date, new CAC methods need to be still designed. In this work, we propose a new big data analytics method for automatic CAC of intracardiac Electrograms (EGMs) stored in Implantable Cardioverter Defibrillators (ICDs). The proposed method combines the effectiveness of a measure based on data compression concepts (Jaccard dictionary similarity), which exploits the information among EGMs, and the classification power of kernel methods. It also requires minimal EGM preprocessing and allows us to deal with EGMs of different duration. A database of 6848 EGMs extracted from a national scientific big data service for ICDs, named SCOOP platform, were used in our experiments. Performance for two classifiers (k-Nearest Neighbors or k-NN, and Support Vector Machines or SVM) were compared in two CAC scenarios using four different input spaces. Results showed that k-NN worked better than SVM when previous episodes from the same patient were available in the classifier design, and vice-versa. For the best cases, k-NN and SVM yielded accuracies near to 95% and 85%, respectively. These results suggest that the proposed method can be used as a high-quality big data service for CAC, providing a support to cardiologists for improving the knowledge on patient diagnosis.

[1]  Jure Leskovec,et al.  Mining of Massive Datasets, 2nd Ed , 2014 .

[2]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[3]  Alfons F. Sinnaeve,et al.  Implantable Cardioverter - Defibrillators Step by Step: An Illustrated Guide , 2009 .

[4]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[6]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[7]  Alain Ripart,et al.  Arrhythmia detection by dual-chamber implantable cardioverter defibrillators. A review of current algorithms. , 2004, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[8]  E. Wegert Visual Complex Functions , 2012 .

[9]  Javier Alzueta,et al.  Clinical profile and incidence of ventricular arrhythmia in patients undergoing defibrillator generator replacement in Spain. , 2014, Revista espanola de cardiologia.

[10]  Leon Hirsch,et al.  Visual Complex Functions An Introduction With Phase Portraits , 2016 .

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  J.L. Rojo-Alvarez,et al.  Discriminating between supraventricular and ventricular tachycardias from EGM onset analysis , 2002, IEEE Engineering in Medicine and Biology Magazine.

[13]  Mihai Datcu,et al.  A fast compression-based similarity measure with applications to content-based image retrieval , 2012, J. Vis. Commun. Image Represent..

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  José Luis Rojo-Álvarez,et al.  Symmetrical Compression Distance for Arrhythmia Discrimination in Cloud-Based Big-Data Services , 2015, IEEE Journal of Biomedical and Health Informatics.