Accuracy of rule extraction using a recursive-rule extraction algorithm with continuous attributes combined with a sampling selection technique for the diagnosis of liver disease

Abstract Although liver cancer is the second most common cause of death from cancer worldwide, because of the limited accuracy and interpretability of extracted classification rules, the diagnosis of liver disease remains difficult. In addition, hepatitis, which is inflammation of the liver, can progress to fibrosis, cirrhosis, or even liver cancer. Numerous methods for diagnosing liver disease have been applied, but most current diagnostic methods are black box models that cannot adequately reveal information hidden in the data. In the medical setting, extracted rules must be not only highly accurate, but also highly interpretable. The Recursive-Rule eXtraction (Re-RX) algorithm is a white box model that generates highly accurate and interpretable classification rules on the basis of both discrete and continuous attributes; however, it tends to generate more rules than other rule extraction algorithms. The objectives of this study were to use a new rule extraction algorithm, Continuous Re-RX combined with sampling selection techniques (Sampling-Continuous Re-RX), to achieve highly accurate and interpretable diagnostic rules for the BUPA and Hepatitis datasets and to quantify the associations between the presence and severity of ascites and serum biomarkers with the risk of developing hepatitis in consideration of Child-Pugh scores. The performance of Sampling-Continuous Re-RX was compared with existing techniques, and as a result, it was found to extract more accurate, concise, and interpretable rules for the BUPA and Hepatitis datasets compared with previous extraction algorithms. In addition, the rules extracted using the proposed method were close to the trade-off curve, which indicated that they were more accurate and interpretable, and therefore more suitable in the medical setting.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Kemal Polat,et al.  A new medical decision making system: Least square support vector machine (LSSVM) with Fuzzy Weighting Pre-processing , 2007, Expert Syst. Appl..

[3]  Lance Chun Che Fung,et al.  Binary classification using ensemble neural networks and interval neutrosophic sets , 2009, Neurocomputing.

[4]  Jürgen Rehm,et al.  Global burden of alcoholic liver diseases. , 2013, Journal of hepatology.

[5]  Xuegong Zhang,et al.  Kernel Nearest-Neighbor Algorithm , 2002, Neural Processing Letters.

[6]  Der-Chiang Li,et al.  A learning method for the class imbalance problem with medical data sets , 2010, Comput. Biol. Medicine.

[7]  Pasi Luukka,et al.  Classification based on fuzzy robust PCA algorithms and similarity classifier , 2009, Expert Syst. Appl..

[8]  Ya-Ju Fan,et al.  Optimizing feature selection to improve medical diagnosis , 2010, Ann. Oper. Res..

[9]  B. Stewart,et al.  World Cancer Report , 2003 .

[10]  David Martens,et al.  Active Learning-Based Pedagogical Rule Extraction , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Kemal Polat,et al.  Medical decision support system based on artificial immune recognition immune system (AIRS), fuzzy weighted pre-processing and feature selection , 2007, Expert Syst. Appl..

[12]  Ricardo Tanscheit,et al.  Fuzzy rules extraction from support vector machines for multi-class classification , 2012, Neural Computing and Applications.

[13]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[14]  EASL clinical practical guidelines: management of alcoholic liver disease. , 2012, Journal of hepatology.

[15]  Pasi Luukka Fuzzy beans in classification , 2011, Expert Syst. Appl..

[16]  W. Kim,et al.  Serum activity of alanine aminotransferase (ALT) as an indicator of health and disease , 2008, Hepatology.

[17]  Yoichi Hayashi,et al.  Using Sample Selection to Improve Accuracy and Simplicity of Rules Extracted from Neural Networks for Credit Scoring Applications , 2015, Int. J. Comput. Intell. Appl..

[18]  Jafar Habibi,et al.  Disease Diagnosis with a hybrid method SVR using NSGA-II , 2014, Neurocomputing.

[19]  Yuehwern Yih,et al.  Knowledge acquisition through information granulation for imbalanced data , 2006, Expert Syst. Appl..

[20]  Rudy Setiono,et al.  Setiono Sample selection for credit scoring SAMPLE SELECTION AND NEURAL NETWORK RULE EXTRACTION FOR CREDIT SCORING , 2012 .

[21]  M.M.B.R. Vellasco,et al.  Inverted hierarchical neuro-fuzzy BSP system: a novel neuro-fuzzy model for pattern classification and rule extraction in databases , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[23]  Yoichi Hayashi,et al.  Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset , 2016 .

[24]  Witold Pedrycz,et al.  Extraction of fuzzy rules from fuzzy decision trees: An axiomatic fuzzy sets (AFS) approach , 2013, Data Knowl. Eng..

[25]  H. Toyoda,et al.  Usefulness of albumin–bilirubin grade for evaluation of prognosis of 2584 Japanese patients with hepatocellular carcinoma , 2016, Journal of gastroenterology and hepatology.

[26]  Kun-Huang Chen,et al.  An improved electromagnetism-like mechanism algorithm and its application to the prediction of diabetes mellitus , 2015, J. Biomed. Informatics.

[27]  Musa Peker,et al.  A decision support system to improve medical diagnosis using a combination of k-medoids clustering based attribute weighting and SVM , 2016, Journal of Medical Systems.

[28]  Harichandran Khanna Nehemiah,et al.  Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network , 2015, Comput. Math. Methods Medicine.

[29]  José Salvador Sánchez,et al.  On the suitability of resampling techniques for the class imbalance problem in credit scoring , 2013, J. Oper. Res. Soc..

[30]  Kemal Polat,et al.  Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism , 2007, Expert Syst. Appl..

[31]  B. Sangro,et al.  Assessment of liver function in patients with hepatocellular carcinoma: a new evidence-based approach-the ALBI grade. , 2015, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[32]  Yi-Zeng Hsieh,et al.  A PSO-based rule extractor for medical diagnosis , 2014, J. Biomed. Informatics.

[33]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[34]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[35]  Pei-Chann Chang,et al.  An attribute weight assignment and particle swarm optimization algorithm for medical database classifications , 2012, Comput. Methods Programs Biomed..

[36]  Yangyang Li,et al.  A particle swarm optimization based simultaneous learning framework for clustering and classification , 2014, Pattern Recognit..

[37]  J. Walley,et al.  Liver enzymes and risk of all-cause mortality in general populations: a systematic review and meta-analysis. , 2014, International journal of epidemiology.

[38]  Esin Dogantekin,et al.  A new intelligent hepatitis diagnosis system: PCA-LSSVM , 2011, Expert Syst. Appl..

[39]  Harichandran Khanna Nehemiah,et al.  A Swarm Optimization approach for clinical knowledge mining , 2015, Comput. Methods Programs Biomed..

[40]  B. Stewart,et al.  World cancer report 2014. , 2014 .

[41]  J. Everhart,et al.  Elevated serum alanine aminotransferase and gamma-glutamyltransferase and mortality in the United States population. , 2009, Gastroenterology.

[42]  Seral Özsen,et al.  Attribute weighting via genetic algorithms for attribute weighted artificial immune system (AWAIS) and its application to heart disease and liver disorders problems , 2009, Expert Syst. Appl..

[43]  Der-Chiang Li,et al.  A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets , 2011, Artif. Intell. Medicine.

[44]  Tong Heng Lee,et al.  Evolutionary computing for knowledge discovery in medical diagnosis , 2003, Artif. Intell. Medicine.

[45]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[46]  U. Rajendra Acharya,et al.  Ultrasound-based tissue characterization and classification of fatty liver disease: A screening and diagnostic paradigm , 2015, Knowl. Based Syst..

[47]  F. Brancati,et al.  Elevated ALT and GGT predict all-cause mortality and hepatocellular carcinoma in Taiwanese male: a case-cohort study , 2013, Hepatology International.

[48]  R. Pugh,et al.  Transection of the oesophagus for bleeding oesophageal varices , 1973, The British journal of surgery.

[49]  Chee Peng Lim,et al.  A hybrid FAM–CART model and its application to medical data classification , 2015, Neural Computing and Applications.

[50]  Satoshi Nakano,et al.  Use of a Recursive-Rule eXtraction algorithm with J48graft to achieve highly accurate and concise rule extraction from a large breast cancer dataset , 2015 .

[51]  Y. Hayashi,et al.  Use of the recursive-rule extraction algorithm with continuous attributes to improve diagnostic accuracy in thyroid disease , 2015 .

[52]  E. Cholongitas,et al.  Systematic review: the model for end‐stage liver disease – should it replace Child‐Pugh's classification for assessing prognosis in cirrhosis? , 2005, Alimentary pharmacology & therapeutics.

[53]  Rudy Setiono,et al.  A Penalty-Function Approach for Pruning Feedforward Neural Networks , 1997, Neural Computation.

[54]  Jerzy Stefanowski,et al.  BRACID: a comprehensive approach to learning rules from imbalanced data , 2011, Journal of Intelligent Information Systems.

[55]  Kemal Polat,et al.  Hepatitis disease diagnosis using a new hybrid system based on feature selection (FS) and artificial immune recognition system with fuzzy resource allocation , 2006, Digit. Signal Process..

[56]  Ludmil Mikhailov,et al.  An interpretable fuzzy rule-based classification methodology for medical diagnosis , 2009, Artif. Intell. Medicine.

[57]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[58]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[59]  Kourosh Mozafari,et al.  Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA) , 2012, Comput. Methods Programs Biomed..

[60]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[61]  Kemal Polat,et al.  Application of Attribute Weighting Method Based on Clustering Centers to Discrimination of Linearly Non-Separable Medical Datasets , 2012, Journal of Medical Systems.

[62]  Yoichi Hayashi,et al.  Application of a rule extraction algorithm family based on the Re-RX algorithm to financial credit risk assessment from a Pareto optimal perspective , 2016 .

[63]  Pei-Chann Chang,et al.  A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification , 2011, Appl. Soft Comput..

[64]  P. K. Dash,et al.  An improved cuckoo search based extreme learning machine for medical data classification , 2015, Swarm Evol. Comput..

[65]  Siti Mariyam Hj. Shamsuddin,et al.  Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis , 2014, Soft Comput..

[66]  G. Willemsen,et al.  The Genetic Architecture of Liver Enzyme Levels: GGT, ALT and AST , 2013, Behavior genetics.

[67]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.