Cardiovascular risk assessment using data mining inferencing and feature engineering techniques

With the frequent decline in people’s health due to the hectic lifestyle, increased levels of workload and intake of fast food, there has been an unfortunate growth in the number of patients suffering from cardiovascular diseases each year. Around the world, millions of people die each year due to cardiovascular diseases. While the statistics are eye-opening, with the vast amount of data about heart patients in our hands, we can save millions by detecting the risk at an early stage. With the recent advances in soft computing and fuzzy logic, various algorithmic approaches are employed to tackle the issue of cardiovascular risk assessment through machine learning. Using some of the algorithms of machine learning like Logistic Regression (LR), Naive Bayes (NB), Support vector machine (SVM), and Decision tree (DT), Random Forest (RF) and K-Nearest Neighbours (KNN) classifiers, a model can be built to predict the risk accurately. In this paper, we have analysed each of the above methods normally and through feature engineering techniques like transformation through Principal Component Axes and considering different train-test folds to find the best performing model, which was found to be KNN in terms of all metrics and Logistic Regression in terms of accuracy.

[1]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[2]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[3]  J. Thomas,et al.  Human heart disease prediction system using data mining techniques , 2016, 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT).

[4]  C. Keen,et al.  High-fat, energy-dense, fast-food-style breakfast results in an increase in oxidative stress in metabolic syndrome. , 2008, Metabolism: clinical and experimental.

[5]  Kemal Polat,et al.  Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast Fourier transform , 2007, Appl. Math. Comput..

[6]  Siddharth Swarup Rautaray,et al.  Prediction of Heart Disease by Mining Frequent Items and Classification Techniques , 2019, 2019 International Conference on Intelligent Computing and Control Systems (ICCS).

[7]  Muhammad Arif,et al.  Detection and Localization of Myocardial Infarction using K-nearest Neighbor Classifier , 2012, Journal of Medical Systems.

[8]  D. Grainger,et al.  Metabolic profiling in heart disease , 2006 .

[9]  Erwan Scornet,et al.  On the asymptotics of random forests , 2014, J. Multivar. Anal..

[10]  Kyungsook Han,et al.  Computational Identification of Interaction Motifs in Hepatitis C Virus NS5A and Human Proteins , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[11]  C. Brodley,et al.  Decision tree classification of land cover from remotely sensed data , 1997 .

[12]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[13]  Harihara Santosh Dadi,et al.  Improved Face Recognition Rate Using HOG Features and SVM Classifier , 2016 .

[14]  Kemal Polat,et al.  Breast cancer diagnosis using least square support vector machine , 2007, Digit. Signal Process..

[15]  K. Kim,et al.  Face recognition using kernel principal component analysis , 2002, IEEE Signal Process. Lett..

[16]  L. K. Khan,et al.  Relationship of childhood obesity to coronary heart disease risk factors in adulthood: the Bogalusa Heart Study. , 2001, Pediatrics.

[17]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[18]  Ahmet Alkan,et al.  Identification of EMG signals using discriminant analysis and SVM classifier , 2012, Expert Syst. Appl..

[19]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[20]  S. Thamarai Selvi,et al.  Early Detection of Breast Cancer using SVM Classifier Technique , 2009, ArXiv.

[21]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[22]  A. Alavudeen Basha,et al.  Classification of mammogram for early detection of breast cancer using SVM classifier and Hough transform , 2019, Measurement.

[23]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[24]  Usman Qamar,et al.  An ensemble based decision support framework for intelligent heart disease diagnosis , 2014, International Conference on Information Society (i-Society 2014).

[25]  Siddharth Swarup Rautaray,et al.  Comparative Analysis of Heart Disease Classification Algorithms Using Big Data Analytical Tool , 2020 .

[26]  Li Zhang,et al.  Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks , 2014, Expert Syst. Appl..

[27]  Gautam Srivastava,et al.  Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques , 2019, IEEE Access.

[28]  K. Thanushkodi,et al.  An Improved k-Nearest Neighbor Classification Using Genetic Algorithm , 2010 .

[29]  Sellappan Palaniappan,et al.  Intelligent heart disease prediction system using data mining techniques , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[30]  Touradj Ebrahimi,et al.  Classification of EEG signals using Dempster Shafer theory and a k-nearest neighbor classifier , 2009, 2009 4th International IEEE/EMBS Conference on Neural Engineering.

[31]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[32]  R A Ford,et al.  Estimation of toxic hazard--a decision tree approach. , 1978, Food and cosmetics toxicology.

[33]  Aggelos K. Katsaggelos,et al.  Detection of atrial fibrillation in ECG hand-held devices using a random forest classifier , 2017, 2017 Computing in Cardiology (CinC).

[34]  Deborah R. Carvalho,et al.  A hybrid decision tree/genetic algorithm method for data mining , 2004, Inf. Sci..

[35]  William Stafford Noble,et al.  Support vector machine , 2013 .

[36]  Neelam Sharma,et al.  INTRUSION DETECTION USING NAIVE BAYES CLASSIFIER WITH FEATURE REDUCTION , 2012 .

[37]  Aboul Ella Hassanien,et al.  A random forest classifier for lymph diseases , 2014, Comput. Methods Programs Biomed..

[38]  Ping-Min Lin,et al.  A fall detection system using k-nearest neighbor classifier , 2010, Expert Syst. Appl..

[39]  V. Rao Vemuri,et al.  Use of K-Nearest Neighbor classifier for intrusion detection , 2002, Comput. Secur..

[40]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[41]  Cuong Nguyen,et al.  Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic , 2013 .

[42]  Kalyani Kadam,et al.  PREDICTION OF HEART DISEASE USING K-MEANS and ARTIFICIAL NEURAL NETWORK as HYBRID APPROACH to IMPROVE ACCURACY , 2017 .

[43]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[44]  Abdulhamit Subasi,et al.  Congestive heart failure detection using random forest classifier , 2016, Comput. Methods Programs Biomed..

[45]  Kien A. Hua,et al.  Decision tree classifier for network intrusion detection with GA-based feature selection , 2005, ACM Southeast Regional Conference.

[46]  Second International Conference on Computer Networks and Communication Technologies , 2020, Lecture Notes on Data Engineering and Communications Technologies.

[47]  Ricardo Buettner,et al.  Efficient machine learning based detection of heart disease , 2019, 2019 IEEE International Conference on E-health Networking, Application & Services (HealthCom).

[48]  P. Williams,et al.  Physical fitness and activity as separate heart disease risk factors: a meta-analysis. , 2001, Medicine and science in sports and exercise.

[49]  Manjusha Pandey,et al.  A comprehensive survey and analysis of generative models in machine learning , 2020, Comput. Sci. Rev..

[50]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[51]  Thippa Reddy Gadekallu,et al.  Cuckoo Search Optimized Reduction and Fuzzy Logic Classifier for Heart Disease and Diabetes Prediction , 2017, Int. J. Fuzzy Syst. Appl..

[52]  Mohamed Bahaj,et al.  Heart Disease Prediction and Classification Using Machine Learning Algorithms Optimized by Particle Swarm Optimization and Ant Colony Optimization , 2019, International Journal of Intelligent Engineering and Systems.

[53]  Ching Y. Suen,et al.  A novel hybrid CNN-SVM classifier for recognizing handwritten digits , 2012, Pattern Recognit..

[54]  Mykhailo Granik,et al.  Fake news detection using naive Bayes classifier , 2017, 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON).