Analysis and classification of heart diseases using heartbeat features and machine learning algorithms

This study proposed an ECG (Electrocardiogram) classification approach using machine learning based on several ECG features. An electrocardiogram (ECG) is a signal that measures the electric activity of the heart. The proposed approach is implemented using ML-libs and Scala language on Apache Spark framework; MLlib is Apache Spark’s scalable machine learning library. The key challenge in ECG classification is to handle the irregularities in the ECG signals which is very important to detect the patient status. Therefore, we have proposed an efficient approach to classify ECG signals with high accuracy Each heartbeat is a combination of action impulse waveforms produced by different specialized cardiac heart tissues. Heartbeats classification faces some difficulties because these waveforms differ from person to another, they are described by some features. These features are the inputs of machine learning algorithm. In general, using Spark–Scala tools simplifies the usage of many algorithms such as machine-learning (ML) algorithms. On other hand, Spark–Scala is preferred to be used more than other tools when size of processing data is too large. In our case, we have used a dataset with 205,146 records to evaluate the performance of our approach. Machine learning libraries in Spark–Scala provide easy ways to implement many classification algorithms (Decision Tree, Random Forests, Gradient-Boosted Trees (GDB), etc.). The proposed method is evaluated and validated on baseline MIT-BIH Arrhythmia and MIT-BIH Supraventricular Arrhythmia database. The results show that our approach achieved an overall accuracy of 96.75% using GDB Tree algorithm and 97.98% using random Forest for binary classification. For multi class classification, it achieved to 98.03% accuracy using Random Forest, Gradient Boosting tree supports only binary classification.

[1]  Rodrigo Varejão Andreão,et al.  Heartbeat classification system based on neural networks and dimensionality reduction , 2017 .

[2]  Antonio Celesti,et al.  Big data analytics in genomics: The point on Deep Learning solutions , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[3]  M. Anwar Ma'sum,et al.  Enhanced tele ECG system using Hadoop framework to deal with big data processing , 2016, 2016 International Workshop on Big Data and Information Security (IWBIS).

[4]  Kadan Aljoumaa,et al.  Customer churn prediction in telecom using machine learning in big data platform , 2019, Journal of Big Data.

[5]  J. Friedman Stochastic gradient boosting , 2002 .

[6]  G Bortolan,et al.  Premature ventricular contraction classification by the Kth nearest-neighbours rule , 2005, Physiological measurement.

[7]  Abdul Ghaaliq Lalkhen,et al.  Clinical tests: sensitivity and specificity , 2008 .

[8]  Giuseppe De Pietro,et al.  A deep learning approach for ECG-based heartbeat classification for arrhythmia detection , 2018, Future Gener. Comput. Syst..

[9]  G.B. Moody,et al.  The impact of the MIT-BIH Arrhythmia Database , 2001, IEEE Engineering in Medicine and Biology Magazine.

[10]  Akshi Kumar,et al.  Machine Learning from Theory to Algorithms: An Overview , 2018, Journal of Physics: Conference Series.

[11]  Majid Sarrafzadeh,et al.  ECG Heartbeat Classification: A Deep Transferable Representation , 2018, 2018 IEEE International Conference on Healthcare Informatics (ICHI).

[12]  Lei Yang,et al.  A Human ECG Identification System Based on Ensemble Empirical Mode Decomposition , 2013, Sensors.

[13]  Moncef Gabbouj,et al.  A Generic and Robust System for Automated Patient-Specific Classification of ECG Signals , 2009, IEEE Transactions on Biomedical Engineering.

[14]  Y. Skaik Understanding and using sensitivity, specificity and predictive values , 2008, Indian journal of ophthalmology.

[15]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  M. Teshnelab,et al.  Comparison of neural network, ANFIS, and SVM classifiers for PVC arrhythmia detection , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[17]  Danyang Yuan,et al.  Genetic algorithm for the optimization of features and neural networks in ECG signals classification , 2017, Scientific Reports.

[18]  Luca Citi,et al.  Revealing Real-Time Emotional Responses: a Personalized Assessment based on Heartbeat Dynamics , 2014, Scientific Reports.

[19]  G. Valenza,et al.  Inhomogeneous Point-Processes to Instantaneously Assess Affective Haptic Perception through Heartbeat Dynamics Information , 2016, Scientific Reports.