Use of a Recursive-Rule eXtraction algorithm with J48graft to achieve highly accurate and concise rule extraction from a large breast cancer dataset

Abstract To assist physicians in the diagnosis of breast cancer and thereby improve survival, a highly accurate computer-aided diagnostic system is necessary. Although various machine learning and data mining approaches have been devised to increase diagnostic accuracy, most current methods are inadequate. The recently developed Recursive-Rule eXtraction (Re-RX) algorithm provides a hierarchical, recursive consideration of discrete variables prior to analysis of continuous data, and can generate classification rules that have been trained on the basis of both discrete and continuous attributes. The objective of this study was to extract highly accurate, concise, and interpretable classification rules for diagnosis using the Re-RX algorithm with J48graft, a class for generating a grafted C4.5 decision tree. We used the Wisconsin Breast Cancer Dataset (WBCD). Nine research groups provided 10 kinds of highly accurate concrete classification rules for the WBCD. We compared the accuracy and characteristics of the rule set for the WBCD generated using the Re-RX algorithm with J48graft with five rule sets obtained using 10-fold cross validation (CV). We trained the WBCD using the Re-RX algorithm with J48graft and the average classification accuracies of 10 runs of 10-fold CV for the training and test datasets, the number of extracted rules, and the average number of antecedents for the WBCD. Compared with other rule extraction algorithms, the Re-RX algorithm with J48graft resulted in a lower average number of rules for diagnosing breast cancer, which is a substantial advantage. It also provided the lowest average number of antecedents per rule. These features are expected to greatly aid physicians in making accurate and concise diagnoses for patients with breast cancer.

[1]  Mohd Yusoff Mashor,et al.  Fine Needle Aspiration Cytology Evaluation for Classifying Breast Cancer Using Artificial Neural Network , 2007 .

[2]  J. Rizzo,et al.  Racial disparity in survival from early breast cancer in the department of defense healthcare system , 2015, Journal of surgical oncology.

[3]  José Salvador Sánchez,et al.  On the suitability of resampling techniques for the class imbalance problem in credit scoring , 2013, J. Oper. Res. Soc..

[4]  Geoffrey I. Webb Decision Tree Grafting , 1997, IJCAI.

[5]  Guido Bologna,et al.  QSVM: A Support Vector Machine for Rule Extraction , 2015, IWANN.

[6]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[7]  M. Cevdet Ince,et al.  An expert system for detection of breast cancer based on association rules and neural network , 2009, Expert Syst. Appl..

[8]  David Martens,et al.  Active Learning-Based Pedagogical Rule Extraction , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Gang Wang,et al.  Support Vector Machine Based Diagnostic System for Breast Cancer Using Swarm Intelligence , 2012, Journal of Medical Systems.

[10]  R. Setiono Extracting Rules from Pruned Neural Networks for Breast Cancer Diagnosis , 1996 .

[11]  Elif Derya Übeyli Adaptive Neuro-Fuzzy Inference Systems for Automatic Detection of Breast Cancer , 2009, Journal of Medical Systems.

[12]  Lois Boggess,et al.  ARTIFICIAL IMMUNE SYSTEM CLASSIFICATION OF MULTIPLE- CLASS PROBLEMS , 2002 .

[13]  Joel Quintanilla-Domínguez,et al.  Breast cancer classification applying artificial metaplasticity algorithm , 2011, Neurocomputing.

[14]  Rudy Setiono,et al.  Generating concise and accurate classification rules for breast cancer diagnosis , 2000, Artif. Intell. Medicine.

[15]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[16]  Dayou Liu,et al.  A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis , 2011, Expert Syst. Appl..

[17]  Huan Liu,et al.  Symbolic Representation of Neural Networks , 1996, Computer.

[18]  Aruna Tiwari,et al.  Breast cancer diagnosis using Genetically Optimized Neural Network model , 2015, Expert Syst. Appl..

[19]  W. N. Street,et al.  Image analysis and machine learning applied to breast cancer diagnosis and prognosis. , 1995, Analytical and quantitative cytology and histology.

[20]  Mei-Ling Huang,et al.  Neural Network Classifier with Entropy Based Feature Selection on Breast Cancer Diagnosis , 2010, Journal of Medical Systems.

[21]  H. Kahramanli,et al.  A NEW APPROACH TO CLASSIFICATION RULE EXTRACTION PROBLEM BY THE REAL VALUE CODING , 2012 .

[22]  Rhodri Hayward,et al.  Screening , 2008, The Lancet.

[23]  Pei-Chann Chang,et al.  A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification , 2011, Appl. Soft Comput..

[24]  Jaganathan Palanichamy,et al.  A threshold fuzzy entropy based feature selection for medical database classification , 2013, Comput. Biol. Medicine.

[25]  Ferenc Szeifert,et al.  Supervised fuzzy clustering for the identification of fuzzy classifiers , 2003, Pattern Recognit. Lett..

[26]  Taho Yang,et al.  Rule extraction from support vector machines by genetic algorithms , 2012, Neural Computing and Applications.

[27]  Vennila Ramalingam,et al.  Breast mass classification based on cytological patterns using RBFNN and SVM , 2009, Expert Syst. Appl..

[28]  Geoffrey I. Webb Decision Tree Grafting From the All Tests But One Partition , 1999, IJCAI.

[29]  Kemal Polat,et al.  Breast cancer diagnosis using least square support vector machine , 2007, Digit. Signal Process..

[30]  Ivanoe De Falco,et al.  Differential Evolution for automatic rule extraction from medical databases , 2013, Appl. Soft Comput..

[31]  Usman Qamar,et al.  Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble , 2015 .

[32]  A. Jemal,et al.  Cancer statistics, 2015 , 2015, CA: a cancer journal for clinicians.

[33]  Aytug Onan,et al.  A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer , 2015, Expert Syst. Appl..

[34]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[35]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[36]  Kemal Polat,et al.  A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis , 2007, Comput. Biol. Medicine.

[37]  S. Granter,et al.  Cytologic findings in granular cell tumors, with emphasis on the diagnosis of malignant granular cell tumor by fine‐needle aspiration biopsy , 2001, Cancer.

[38]  Kotagiri Ramamohanarao,et al.  Breast-Cancer identification using HMM-fuzzy approach , 2010, Comput. Biol. Medicine.

[39]  Franco Bonetti,et al.  Is there still a role for fine‐needle aspiration cytology in breast cancer screening? , 2008, Cancer.

[40]  W. McCluggage,et al.  Fine needle aspiration (FNA) cytology of adenoid cystic carcinoma and adenomyoepithelioma of breast: two lesions rich in myoepithelial cells , 1997, Cytopathology : official journal of the British Society for Clinical Cytology.

[41]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[42]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[43]  Bart Baesens,et al.  Minerva: Sequential Covering for Rule Extraction , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  Moshe Sipper,et al.  A fuzzy-genetic approach to breast cancer diagnosis , 1999, Artif. Intell. Medicine.

[45]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[46]  Rudy Setiono,et al.  A Penalty-Function Approach for Pruning Feedforward Neural Networks , 1997, Neural Computation.

[47]  Yoichi Hayashi,et al.  Greedy rule generation from discrete data and its use in neural network rule extraction , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[48]  A. Iacco,et al.  An analysis of fine needle aspiration versus core needle biopsy in clinically palpable breast lesions: a report on the predictive values and a cost comparison. , 2012, American journal of surgery.

[49]  Wlodzislaw Duch,et al.  A new methodology of extraction, optimization and application of crisp and fuzzy logical rules , 2001, IEEE Trans. Neural Networks.

[50]  Elif Derya Übeyli A Mixture of Experts Network Structure for Breast Cancer Diagnosis , 2005, Journal of Medical Systems.

[51]  Joel Quintanilla-Domínguez,et al.  WBCD breast cancer database classification applying artificial metaplasticity neural network , 2011, Expert Syst. Appl..

[52]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[53]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[54]  Huan Liu,et al.  NeuroLinear: From neural networks to oblique decision rules , 1997, Neurocomputing.

[55]  Guido Bologna,et al.  Recursive-Rule Extraction Algorithm With J48graft And Applications To Generating Credit Scores , 2016, J. Artif. Intell. Soft Comput. Res..