Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis

Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.

[1]  Yoshiki Murakami,et al.  Principal Component Analysis Based Feature Extraction Approach to Identify Circulating microRNA Biomarkers , 2013, PloS one.

[2]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[3]  U. Rajendra Acharya,et al.  Automated diagnosis of Coronary Artery Disease affected patients using LDA, PCA, ICA and Discrete Wavelet Transform , 2013, Knowl. Based Syst..

[4]  P. Shekelle,et al.  Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care , 2006, Annals of Internal Medicine.

[5]  A. Govardhan,et al.  Rough-Fuzzy Classifier: A System to Predict the Heart Disease by Blending Two Different Set Theories , 2014 .

[6]  Chun Hui,et al.  Cost-sensitive feature selection in medical data analysis with trace ratio criterion , 2014, 2014 12th International Conference on Signal Processing (ICSP).

[7]  Li Li,et al.  Comparison on PPCA, KPPCA and MPPCA Based Missing Data Imputing for Traffic Flow , 2013 .

[8]  Juan Manuel Górriz,et al.  Principal component analysis-based techniques and supervised classification schemes for the early detection of Alzheimer's disease , 2011, Neurocomputing.

[9]  Dimitrios I. Fotiadis,et al.  Automated Diagnosis of Coronary Artery Disease Based on Data Mining and Fuzzy Modeling , 2008, IEEE Transactions on Information Technology in Biomedicine.

[10]  Sanjay Mishra,et al.  Efficient theory development and factor retention criteria: Abandon the ‘eigenvalue greater than one’ criterion , 2008 .

[11]  Fengxi Song,et al.  Feature Selection Using Principal Component Analysis , 2010, 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization.

[12]  Zhiqiang Ge,et al.  Nonlinear feature extraction for soft sensor modeling based on weighted probabilistic PCA , 2015 .

[13]  Basabi Chakraborty,et al.  A Proposal for Recommendation of Feature Selection Algorithm based on Data Set Characteristics , 2016, J. Univers. Comput. Sci..

[14]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[15]  Ashok Ghatol,et al.  Feature selection for medical diagnosis : Evaluation for cardiovascular diseases , 2013, Expert Syst. Appl..

[16]  Hassan Ismail Abdalla,et al.  PSO-Based Feature Selection for Arabic Text Summarization , 2015, J. Univers. Comput. Sci..

[17]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[18]  U. Rajendra Acharya,et al.  ECG beat classification using PCA, LDA, ICA and Discrete Wavelet Transform , 2013, Biomed. Signal Process. Control..

[19]  J. Plange-Rhule,et al.  Shortage of healthcare workers in developing countries--Africa. , 2009, Ethnicity & disease.

[20]  J. Pohlmann,et al.  Parallel Analysis: a method for determining significant principal components , 1995 .

[21]  Gang Wang,et al.  A new hybrid method based on local fisher discriminant analysis and support vector machines for hepatitis disease diagnosis , 2011, Expert Syst. Appl..

[22]  Junwei Han,et al.  Novel Folded-PCA for improved feature extraction and data reduction with hyperspectral imaging and SAR in remote sensing , 2014 .