Features selection for arrhythmia diagnosis using Relief-F algorithm and support vector machine

Electrocardiography (ECG) is a standard clinical process to record the electrical activity of the heart. It is the most available effective method for diagnosing cardiac arrhythmias. The ECG signals may be classified into either normal or abnormal based on the timing and the potential information of the electrical waves propagating through the heart's muscles. Many advanced signal processing techniques have been used to extract different numerical and logical information from these signals. Usually, the number of extracted features is very high and some of them are redundant, irrelevant and/or noisy. In this work, a thorough experimental study was conducted to reduce the number of ECG features finding a more compact representation of samples selecting the most informative features and removing the others. We empirically investigated the efficiency of two different filter-based feature-selection algorithms for the diagnosis of cardiac arrhythmia: Relief-F and information-gain. We used support vector machine and logistic regression as classification models. Relief-F is a promising filter-based feature selection algorithm. It is a simple and effective algorithm capable of evaluating the feature's importance considering dependence from other features. Classification performance is calculated with three different statistical measures; classification accuracy, sensitivity and specificity. Experimental results showed that the performance of Relief-F with SVM is promising for the diagnosis of cardiac arrhythmia.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[3]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[4]  Ahmed M. Nassef,et al.  Cardiac Arrhythmia Classification Using Boosted Decision Trees , 2015 .

[5]  J. Suykens,et al.  A tutorial on support vector machine-based methods for classification problems in chemometrics. , 2010, Analytica chimica acta.

[6]  Yi-Ping Phoebe Chen,et al.  Computational intelligence for heart disease diagnosis: A medical knowledge driven approach , 2013, Expert Syst. Appl..

[7]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[8]  Ashok Ghatol,et al.  Feature selection for medical diagnosis : Evaluation for cardiovascular diseases , 2013, Expert Syst. Appl..

[9]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[10]  Shoab A. Khan,et al.  Classification of Arrhythmia , 2014 .

[11]  Malay Mitra,et al.  Cardiac Arrhythmia Classification Using Neural Networks with Selected Features , 2013 .

[12]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[13]  Maher Maalouf,et al.  Logistic regression in data analysis: an overview , 2011, Int. J. Data Anal. Tech. Strateg..

[14]  Nicholas L. Crookston,et al.  The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases , 2009 .