Homogeneous and heterogeneous ensemble classification methods in diabetes disease: a review

This paper explores the use of ensemble classification methods in the context of the diabetes disease. An analysis was carried out that formulates and answers seven research questions: publication trends, channels and venues; medical tasks undertaken; ensemble types proposed; single techniques used to construct the ensemble methods; rules used to draw the output of the ensemble; datasets used to build and evaluate the ensemble methods; and tools used. A total of 107 papers were chosen after a study selection process. Ensemble methods were applied to diabetes in 2003 for the first time. All medical tasks related to the diabetes disease were investigated, and the diagnosis task was the most frequently addressed activity by means of ensemble methods. The homogeneous ensembles were the most common in the literature. Moreover, decision trees and support vector machines were the most used techniques to build homogeneous and heterogeneous ensembles, respectively. The most frequently found combiner was the majority voting rule. Our findings suggest that ensemble classification methods yield better accuracy than single classifiers. This statement, however, requires an aggregation of the evidence reported in the literature by means of a systematic literature review.

[1]  U. Rajendra Acharya,et al.  Automated identification of normal and diabetes heart rate signals using nonlinear measures , 2013, Comput. Biol. Medicine.

[2]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[3]  Amir-Masoud Eftekhari-Moghadam,et al.  Knowledge discovery in medicine: Current issue and future trend , 2014, Expert Syst. Appl..

[4]  Mostafa El Habib Daho,et al.  Combining Bootstrapping Samples, Random Subspaces and Random Forests to Build Classifiers , 2015 .

[5]  Han Wang,et al.  Ensemble Based Extreme Learning Machine , 2010, IEEE Signal Processing Letters.

[6]  Anupam Shukla,et al.  Comparative analysis of intelligent hybrid systems for detection of PIMA indian diabetes , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[7]  J. Danesh,et al.  Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. , 2010, Lancet.

[8]  Gretchen A. Stevens,et al.  Causes of vision loss worldwide, 1990-2010: a systematic analysis. , 2013, The Lancet. Global health.

[9]  Konstantina S. Nikita,et al.  A hybrid Decision Support System for the risk assessment of retinopathy development as a long term complication of Type 1 Diabetes Mellitus , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[10]  Usman Qamar,et al.  An Efficient Rule-Based Classification of Diabetes Using ID3, C4.5, & CART Ensembles , 2014, 2014 12th International Conference on Frontiers of Information Technology.

[11]  Ali Idri,et al.  Software effort estimation using classical analogy ensembles based on random subspace , 2017, SAC.

[12]  Yaser M. Roshan,et al.  A comparative analysis of classification algorithms in diabetic retinopathy screening , 2017, 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE).

[13]  Izhan Fakhruzi An artificial neural network with bagging to address imbalance datasets on clinical prediction , 2018, 2018 International Conference on Information and Communications Technology (ICOIACT).

[14]  Khin Mo Mo Tun,et al.  AN APPROACH FOR BREAST CANCER DIAGNOSIS CLASSIFICATION USING NEURAL NETWORK , 2015 .

[15]  Yoichi Hayashi,et al.  Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset , 2016 .

[16]  Fei Su,et al.  Face recognition using SURF features , 2009, International Symposium on Multispectral Image Processing and Pattern Recognition.

[17]  Ioannis A. Kakadiaris,et al.  A Comparison of Supervised Machine Learning Techniques for Predicting Short-Term In-Hospital Length of Stay among Diabetic Patients , 2014, 2014 13th International Conference on Machine Learning and Applications.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Arif Gülten,et al.  Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms , 2011, Comput. Methods Programs Biomed..

[20]  Nilesh V. Patel,et al.  A comprehensive search for expert classification methods in disease diagnosis and prediction , 2018, Expert Syst. J. Knowl. Eng..

[21]  Alireza Osareh,et al.  Parallel weak learners, a novel ensemble method , 2010, 2010 IEEE International Conference on Computational Intelligence and Computing Research.

[22]  Usman Qamar,et al.  IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework , 2016, J. Biomed. Informatics.

[23]  Gang Luo,et al.  Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction , 2016, Health Information Science and Systems.

[24]  Bartosz Krawczyk,et al.  On optimal settings of classification tree ensembles for medical decision support , 2013, Health Informatics J..

[25]  Konstantina S. Nikita,et al.  A Meta-classifier Approach for Medical Diagnosis , 2004, SETN.

[26]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[27]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[28]  Wang Yong,et al.  A Better Classifier Based on Rough Set and Neural Network for Medical Images , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[29]  Chunxiao Xing,et al.  Fasting Blood Glucose Change Prediction Model Based on Medical Examination Data and Data Mining Techniques , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).

[30]  Alain Abran,et al.  Systematic literature review of ensemble effort estimation , 2016, J. Syst. Softw..

[31]  M. Pasquier,et al.  Predicting hypoglycemia in diabetic patients using data mining techniques , 2013, 2013 9th International Conference on Innovations in Information Technology (IIT).

[32]  Sreekanth Rallapalli,et al.  Predicting the risk of diabetes in big data electronic health Records by using scalable random forest classification algorithm , 2016, 2016 International Conference on Advances in Computing and Communication Engineering (ICACCE).

[33]  Keshab K. Parhi,et al.  DREAM: Diabetic Retinopathy Analysis Using Machine Learning , 2014, IEEE Journal of Biomedical and Health Informatics.

[34]  Riccardo Bellazzi,et al.  Machine Learning Methods to Predict Diabetes Complications , 2018, Journal of diabetes science and technology.

[35]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Alan Wee-Chung Liew,et al.  A novel genetic algorithm approach for simultaneous feature and classifier selection in multi classifier system , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[37]  Alan B Leichtman,et al.  US Renal Data System 2014 Annual Data Report: Epidemiology of Kidney Disease in the United States. , 2015, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[38]  D. Ruta,et al.  An Overview of Classifier Fusion Methods , 2000 .

[39]  K. Usha Rani,et al.  ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA , 2012 .

[40]  Karim Keshavjee,et al.  Performance Analysis of Data Mining Classification Techniques to Predict Diabetes , 2016 .

[41]  Amit Kumar,et al.  A Hybrid Predictive Model Integrating C4.5 and Decision Table Classifiers for Medical Data Sets , 2018, J. Inf. Technol. Res..

[42]  Malinda Peeples,et al.  Hypoglycemia Prediction Using Machine Learning Models for Patients With Type 2 Diabetes , 2014, Journal of diabetes science and technology.

[43]  Alain Abran,et al.  Improved estimation of software development effort using Classical and Fuzzy Analogy ensembles , 2016, Appl. Soft Comput..

[44]  Ali Idri,et al.  Knowledge discovery in cardiology: A systematic literature review , 2017, Int. J. Medical Informatics.

[45]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[46]  Chengqi Zhang,et al.  Empirical Study of Bagging Predictors on Medical Data , 2011, AusDM.

[47]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[48]  Bayu Adhi Tama,et al.  Tree-based classifier ensembles for early detection method of diabetes: an exploratory study , 2019, Artificial Intelligence Review.

[49]  David England,et al.  Predicting Diabetes Onset: An Ensemble Supervised Learning Approach , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[50]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Bálint Antal,et al.  An ensemble-based system for automatic screening of diabetic retinopathy , 2014, Knowl. Based Syst..

[52]  Gonzalo Álvarez,et al.  Hierarchical classifiers based on neighbourhood criteria with adaptive computational cost , 2002, Pattern Recognit..