A rule extraction approach from support vector machines for diagnosing hypertension among diabetics

Abstract Diabetes mellitus is a major non-communicable disease ever rising as an epidemic and a public health crisis worldwide. One of the several life-threatening complications of diabetes is hypertension or high blood pressure which mostly remains undiagnosed and untreated until symptoms become severe. Diabetic complications can be greatly reduced or prevented by early detection of individuals at risk. In recent past, several machine learning classification algorithms have been widely applied for diagnosing diabetes but very few studies have been conducted for detecting hypertension among diabetic subjects. Specifically, existing rule-based models fail to produce comprehensible rule sets. To resolve this limitation, this paper endeavours to develop a hybrid approach for extracting rules from support vector machines. A feature selection mechanism is introduced for selecting significantly associated features from the dataset. XGBoost has been utilized to convert SVM black box model into an apprehensible decision-making tool. A new dataset has been obtained from Pt. JNM, Medical College, Raipur, India comprising of 300 diabetic subjects with 108 hypertensives and 192 normotensives. In addition, five public diabetes-related datasets have been taken for generalization of the results. Experiments reveal that the proposed model outperforms ten other benchmark classifiers. Friedman rank and post hoc Bonferroni-Dunn tests demonstrate the significance of the proposed method over others.

[1]  Marian B. Gorzalczany,et al.  Interpretable and accurate medical data classification - a multi-objective genetic-fuzzy optimization approach , 2017, Expert Syst. Appl..

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  Ezra Belay Ketema,et al.  Correlation of fasting and postprandial plasma glucose with HbA1c in assessing glycemic control; systematic review and meta-analysis , 2015, Archives of Public Health.

[4]  Senlin Luo,et al.  Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes , 2015, IEEE Journal of Biomedical and Health Informatics.

[5]  Shigehiro Katayama,et al.  Clinical features and therapeutic perspectives on hypertension in diabetics , 2018, Hypertension Research.

[6]  Andrew P. Bradley,et al.  Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus , 2010, IEEE Transactions on Information Technology in Biomedicine.

[7]  Ying Zhang,et al.  Rule Extraction from Trained Support Vector Machines , 2005, PAKDD.

[8]  Yang Zhang,et al.  DRC-BK: Mining Classification Rules with Help of SVM , 2004, PAKDD.

[9]  Lars Niklasson,et al.  The Truth is In There - Rule Extraction from Opaque Models Using Genetic Programming , 2004, FLAIRS.

[10]  Gang Wang,et al.  Cluster analysis: a new approach for identification of underlying risk factors for coronary artery disease in essential hypertensive patients , 2017, Scientific Reports.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[13]  Patricia Melin,et al.  A hybrid model based on modular neural networks and fuzzy systems for classification of blood pressure and hypertension risk diagnosis , 2018, Expert Syst. Appl..

[14]  K G M Moons,et al.  Prediction models for the risk of cardiovascular disease in patients with type 2 diabetes: a systematic review , 2011, Heart.

[15]  Novruz Allahverdi,et al.  Extracting rules for classification problems: AIS based approach , 2009, Expert Syst. Appl..

[16]  P. Zimmet,et al.  Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO Consultation , 1998, Diabetic medicine : a journal of the British Diabetic Association.

[17]  Satoshi Teramukai,et al.  Dynamic prediction model and risk assessment chart for cardiovascular disease based on on-treatment blood pressure and baseline risk factors , 2015, Hypertension Research.

[18]  Nahla H. Barakat,et al.  Hybrid rule-extraction from support vector machines , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[19]  Joachim Diederich,et al.  The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks , 1998, IEEE Trans. Neural Networks.

[20]  Yoichi Hayashi,et al.  Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset , 2016 .

[21]  Andrew P. Bradley,et al.  Rule Extraction from Support Vector Machines: A Sequential Covering Approach , 2007, IEEE Transactions on Knowledge and Data Engineering.

[22]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[23]  Xiuju Fu,et al.  Extracting the knowledge embedded in support vector machines , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[24]  Touhid Bhuiyan,et al.  Dataset on significant risk factors for Type 1 Diabetes: A Bangladeshi perspective , 2018, Data in brief.

[25]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[26]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[27]  I. Vlahavas,et al.  Machine Learning and Data Mining Methods in Diabetes Research , 2017, Computational and structural biotechnology journal.

[28]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[29]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[30]  Ayman El-Baz,et al.  Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm , 2017, Comput. Methods Programs Biomed..

[31]  Sherif Sakr,et al.  Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford ExercIse Testing (FIT) Project , 2018, PloS one.

[32]  Vadlamani Ravi,et al.  Support vector regression based hybrid rule extraction methods for forecasting , 2010, Expert Syst. Appl..

[33]  J. Shaw,et al.  IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. , 2011, Diabetes research and clinical practice.

[34]  Patricia Melin,et al.  A Hybrid Intelligent System Model for Hypertension Diagnosis , 2017, Nature-Inspired Design of Hybrid Intelligent Systems.

[35]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[36]  Tin Wee Tan,et al.  Development of a clinical decision support system for diabetes care: A pilot study , 2017, PloS one.

[37]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[38]  Mihoko Kurano,et al.  mRNA expression of platelet activating factor receptor (PAFR) in peripheral blood mononuclear cells is associated with albuminuria and vascular dysfunction in patients with type 2 diabetes. , 2018, Diabetes research and clinical practice.

[39]  Thangavel Alphonse Thanaraj,et al.  Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study , 2013, BMJ Open.

[40]  Jun Liu,et al.  Prevalence of diabetes mellitus in outpatients with essential hypertension in China: a cross-sectional study , 2013, BMJ Open.

[41]  Chee Peng Lim,et al.  A hybrid model of fuzzy ARTMAP and genetic algorithm for data classification and rule extraction , 2016, Expert Syst. Appl..

[42]  David Rodbard,et al.  Design of a Decision Support System to Help Clinicians Manage Glycemia in Patients with Type 2 Diabetes Mellitus , 2011, Journal of diabetes science and technology.

[43]  Cynthia R. Marling,et al.  Emerging Applications for Intelligent Diabetes Management , 2011, AI Mag..

[44]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[45]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[46]  Shifei Ding,et al.  An overview on nonparallel hyperplane support vector machine algorithms , 2013, Neural Computing and Applications.

[47]  Gang Luo,et al.  Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction , 2016, Health Information Science and Systems.

[48]  J HamiltonHoward,et al.  Interestingness measures for data mining , 2006 .

[49]  David Barber,et al.  Using machine learning to predict hypertension from a clinical dataset , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[50]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[51]  X. Wang,et al.  Super-parameter selection for Gaussian-Kernel SVM based on outlier-resisting , 2014 .

[52]  Ruxandra Stoean,et al.  Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection , 2013, Expert Syst. Appl..

[53]  Glenn Fung,et al.  Rule extraction from linear support vector machines , 2005, KDD '05.

[54]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[55]  Shing I. Chang,et al.  A medical decision support system for disease diagnosis under uncertainty , 2017, Expert Syst. Appl..

[56]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[57]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[58]  J. Friedman Stochastic gradient boosting , 2002 .

[59]  Andreu Català,et al.  Rule extraction from support vector machines , 2002, ESANN.

[60]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[61]  S. Feld,et al.  The American Association of Clinical Endocrinologists Medical Guidelines for the Management of Diabetes Mellitus: The AACE System of Intensive Diabetes Self-Management - 2002 Update , 2002 .

[62]  Chris Aldrich,et al.  ANN-DT: an algorithm for extraction of decision trees from artificial neural networks , 1999, IEEE Trans. Neural Networks.

[63]  Joachim Diederich,et al.  Rule Extraction from Support Vector Machines: An Introduction , 2008, Rule Extraction from Support Vector Machines.

[64]  J. Shaw,et al.  IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. , 2018, Diabetes research and clinical practice.

[65]  Martha Pulido,et al.  A New Model Based on a Fuzzy System for Arterial Hypertension Classification , 2018, Fuzzy Logic Augmentation of Neural and Optimization Algorithms.

[66]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[67]  N. Sambasiva Rao,et al.  Survey on clinical prediction models for diabetes prediction , 2017, Journal of Big Data.

[68]  V. Ravi,et al.  Rule extraction using Support Vector Machine based hybrid classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[69]  Peihua Chen,et al.  Diabetes classification model based on boosting algorithms , 2018, BMC Bioinformatics.

[70]  Andrew P. Bradley,et al.  Rule extraction from support vector machines: A review , 2010, Neurocomputing.

[71]  Patricia Melin,et al.  Design of an Optimized Fuzzy Classifier for the Diagnosis of Blood Pressure with a New Computational Method for Expert Rule Optimization , 2017, Algorithms.

[72]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[73]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[74]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[75]  Andrew P. Bradley,et al.  Rule Extraction from Support Vector Machines: Measuring the Explanation Capability Using the Area under the ROC Curve , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[76]  David Martens,et al.  Active Learning-Based Pedagogical Rule Extraction , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[77]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[78]  Kee Seng Chia,et al.  Diabetes in Asia and the Pacific: Implications for the Global Epidemic , 2016, Diabetes Care.

[79]  Patricio Lopez-Jaramillo,et al.  The goal of blood pressure in the hypertensive patient with diabetes is defined: now the challenge is go from recommendations to practice , 2014, Diabetology & Metabolic Syndrome.

[80]  E. Frohlich,et al.  Diabetes, Hypertension, and Cardiovascular Disease: An Update , 2001, Hypertension.

[81]  Vadlamani Ravi,et al.  Support Vector Machine based Hybrid Classifiers and Rule Extraction thereof: Application to Bankruptcy Prediction in Banks , 2010 .

[82]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[83]  Patricia Melin,et al.  Fuzzy Optimized Classifier for the Diagnosis of Blood Pressure Using Genetic Algorithm , 2018, Fuzzy Logic Augmentation of Neural and Optimization Algorithms.

[84]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .

[85]  Zhaohong Deng,et al.  Detection of Epileptic Seizures in EEG Signals with Rule-Based Interpretation by Random Forest Approach , 2015, ICIC.