Comparing different supervised machine learning algorithms for disease prediction

BackgroundSupervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study ai7ms to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction.MethodsIn this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction.ResultsWe found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered.ConclusionThis study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

[1]  Bikram Pratim Bhuyan,et al.  Machine Learning in Predicting Hemoglobin Variants , 2018 .

[2]  Linhong Ji,et al.  Dynamic Modeling and Interactive Performance of PARM: A Parallel Upper-Limb Rehabilitation Robot Using Impedance Control for Patients after Stroke , 2018, Journal of healthcare engineering.

[3]  Barry L. Zaret Welcome to theJournal of Nuclear Cardiology , 1994 .

[4]  Min Chen,et al.  Disease Prediction by Machine Learning Over Big Data From Healthcare Communities , 2017, IEEE Access.

[5]  Muhammad Usman,et al.  Efficient Heart Disease Prediction System using K-Nearest Neighbor Classification Technique , 2017, BDIOT2017.

[6]  R. Ani,et al.  Decision support system for diagnosis and prediction of chronic renal failure using random subspace classification , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  I. Vlahavas,et al.  Machine Learning and Data Mining Methods in Diabetes Research , 2017, Computational and structural biotechnology journal.

[9]  Kiran Jyoti,et al.  An Analysis of Heart Disease Prediction using Different Data Mining Techniques , 2012 .

[10]  Saeed Arif Shah,et al.  Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies. , 2017, Cancer biomarkers : section A of Disease markers.

[11]  Dilip Singh Sisodia,et al.  Prediction of Diabetes using Classification Algorithms , 2018 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[14]  Robert J. Marks,et al.  A performance comparison of trained multilayer perceptrons and trained classification trees , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[15]  H. Joensuu,et al.  Artificial Neural Networks Applied to Survival Prediction in Breast Cancer , 1999, Oncology.

[16]  Phayung Meesad,et al.  A highly accurate firefly based algorithm for heart disease prediction , 2015, Expert Syst. Appl..

[17]  Nitesh V. Chawla,et al.  Time to CARE: a collaborative engine for practical disease prediction , 2010, Data Mining and Knowledge Discovery.

[18]  J. Doyle,et al.  Cost of patients with primary open-angle glaucoma: a retrospective study of commercial insurance claims data. , 2007, Ophthalmology (Rochester, Minn.).

[19]  Chieh-Chen Wu,et al.  Applications of Machine Learning in Fatty Live Disease Prediction , 2018, MIE.

[20]  Juanmei Liu,et al.  Comparison of Prediction Model for Cardiovascular Autonomic Dysfunction Using Artificial Neural Network and Logistic Regression Analysis , 2013, PloS one.

[21]  D. Schopflocher,et al.  Using administrative data to understand the geography of case ascertainment. , 2009, Chronic diseases in Canada.

[22]  Ms. Ishtake " Intelligent Heart Disease Prediction System Using Data Mining Techniques " , .

[23]  Fuhao Zou,et al.  Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected via Iterative Sure Independent Screening Method , 2015, PloS one.

[24]  Gang Wang,et al.  An efficient diagnosis system for detection of Parkinson's disease using fuzzy k-nearest neighbor approach , 2013, Expert Syst. Appl..

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Jing Yang,et al.  A Novel Method for Disease Prediction: Hybrid of Random Forest and Multivariate Adaptive Regression Splines , 2013, J. Comput..

[27]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[28]  S. Bahadur Predict the Diagnosis of Heart Disease Patients Using Classification Mining Techniques , 2013 .

[29]  Cynthia Rudin,et al.  A Hierarchical Model for Association Rule Mining of Sequential Events: An Approach to Automated Medical Symptom Prediction , 2011 .

[30]  Sridhar Mahadevan,et al.  Optimizing Production Manufacturing Using Reinforcement Learning , 1998, FLAIRS.

[31]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[32]  K. Shyamala,et al.  Prediction of Heart Disease using Supervised Learning Algorithms , 2017 .

[33]  Jyotishman Pathak,et al.  Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function , 2016, J. Biomed. Informatics.

[34]  S. Anand,et al.  Non-invasive detection of fasting blood glucose level via electrochemical measurement of saliva , 2016, SpringerPlus.

[35]  Bernd Freisleben,et al.  CARDWATCH: a neural network based database mining system for credit card fraud detection , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[36]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[37]  Cemal Hanilçi,et al.  A comparison of regression methods for remote tracking of Parkinson's disease progression , 2012, Expert Syst. Appl..

[38]  V. Madhu Viswanatham,et al.  Preliminary Cardiac Disease Risk Prediction Based on Medical and Behavioural Data Set Using Supervised Machine Learning Techniques , 2016 .

[39]  A. Sukesh Kumar,et al.  Tablet PC Enabled Body Sensor System for Rural Telehealth Applications , 2016, International journal of telemedicine and applications.

[40]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[41]  Mohammed Shahadat Uddin,et al.  Social Networks Enabled Coordination Model for Cost Management of Patient Hospital Admissions , 2011, Journal for healthcare quality : official publication of the National Association for Healthcare Quality.

[42]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[43]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[44]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[45]  Sara Matzner,et al.  An application of machine learning to network intrusion detection , 1999, Proceedings 15th Annual Computer Security Applications Conference (ACSAC'99).

[46]  Ashkan Sami,et al.  A Multiple-Classifier Framework for Parkinson's Disease Detection Based on Various Vocal Tests , 2016, International journal of telemedicine and applications.

[47]  Abbas Toloie Eshlaghy,et al.  Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence , 2013 .

[48]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[49]  Wooju Kim,et al.  Combination of multiple classifiers for the customer's purchase behavior prediction , 2003, Decis. Support Syst..

[50]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[51]  Xiaopeng Wei,et al.  Predicting the Risk of Heart Failure With EHR Sequential Data Modeling , 2018, IEEE Access.

[52]  Sangeeta Lal,et al.  Effective asthma disease prediction using naive Bayes — Neural network fusion technique , 2014, 2014 International Conference on Parallel, Distributed and Grid Computing.

[53]  Ivan Bratko,et al.  Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer , 1999, AIMDM.

[54]  R. Tamilarasi,et al.  A Study and Analysis of Disease Prediction Techniques in Data Mining for Healthcare , 2015 .

[55]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[56]  M Przybylski,et al.  Factors related to potentially preventable hospitalizations among the elderly. , 1998, Medical care.

[57]  David Lindley,et al.  Fiducial Distributions and Bayes' Theorem , 1958 .

[58]  Jongsik Lee,et al.  Data-Mining-Based Coronary Heart Disease Risk Prediction Model Using Fuzzy Logic and Decision Tree , 2015, Healthcare informatics research.

[59]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[60]  James A. Bartholomai,et al.  Prediction of lung cancer patient survival via supervised machine learning classification techniques , 2017, Int. J. Medical Informatics.

[61]  Matthew E Falagas,et al.  Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses , 2007, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[62]  Peng Lu,et al.  Research on Improved Depth Belief Network-Based Prediction of Cardiovascular Diseases , 2018, Journal of healthcare engineering.

[63]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[64]  Syed Muhammad Anwar,et al.  Wrapper method for feature selection to classify cardiac arrhythmia , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[65]  Katriina Aalto-Setälä,et al.  Detection of genetic cardiac diseases by Ca2+ transient profiles using machine learning methods , 2018, Scientific Reports.

[66]  E S Fisher,et al.  Technology Assessment Using Insurance Claims: Example of Prostatectomy , 1990, International Journal of Technology Assessment in Health Care.

[67]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[68]  Chi-Chun Lee,et al.  Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[69]  Thangavel Alphonse Thanaraj,et al.  Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study , 2013, BMJ Open.

[70]  Oguzhan Alagoz,et al.  Informatics in radiology: comparison of logistic regression and artificial neural network models in breast cancer risk estimation. , 2010, Radiographics : a review publication of the Radiological Society of North America, Inc.

[71]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[72]  Joshua C. Denny,et al.  Type 2 Diabetes Risk Forecasting from EMR Data using Machine Learning , 2012, AMIA.

[73]  Jing Yang,et al.  Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine , 2014, ISBRA.

[74]  H. Mahjub,et al.  Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran , 2013, Healthcare informatics research.

[75]  Guido Germano,et al.  Prediction of cardiac death after adenosine myocardial perfusion SPECT based on machine learning , 2019, Journal of Nuclear Cardiology.

[76]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[77]  I-Fang Chung,et al.  Prediction of Mammalian microRNA binding sites using Random Forests , 2012, 2012 International Conference on System Science and Engineering (ICSSE).

[78]  B.Venkatalakshmi,et al.  Heart Disease Diagnosis Using PredictiveData mining , 2014 .

[79]  M Anbarasi,et al.  ENHANCED PREDICTION OF HEART DISEASE WITH FEATURE SUBSET SELECTION USING GENETIC ALGORITHM , 2010 .

[80]  K. Thenmozhi,et al.  Heart Disease Prediction Using Classification with Different Decision Tree Techniques , 2014 .

[81]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[82]  Durga Toshniwal,et al.  Multistage Classification for Cardiovascular Disease Risk Prediction , 2015, BDA.

[83]  Harry Hemingway,et al.  Evaluation of Machine Learning Methods to Predict Coronary Artery Disease Using Metabolomic Data , 2017, Studies in health technology and informatics.

[84]  Richard Segal,et al.  Risk prediction model for in‐hospital mortality in women with ST‐elevation myocardial infarction: A machine learning approach , 2017, Heart & lung : the journal of critical care.