Breast cancer diagnosis using GA feature selection and Rotation Forest

Breast cancer is one of the primary causes of death among the women worldwide, and the accurate diagnosis is one of the most significant steps in breast cancer treatment. Data mining techniques can support doctors in diagnosis decision-making process. In this paper, we present different data mining techniques for diagnosis of breast cancer. Two different Wisconsin Breast Cancer datasets have been used to evaluate the system proposed in this study. The proposed system has two stages. In the first stage, in order to eliminate insignificant features, genetic algorithms are used for extraction of informative and significant features. This process reduces the computational complexity and speed up the data mining process. In the second stage, several data mining techniques are employed to make a decision for two different categories of subjects with or without breast cancer. Different individual and multiple classifier systems were used in the second stage in order to construct accurate system for breast cancer classification. The performance of the methods is evaluated using classification accuracy, area under receiver operating characteristic curves and F-measure. Results obtained with the Rotation Forest model with GA-based 14 features show the highest classification accuracy (99.48 %), and when compared with the previous works, the proposed approach reveals the enhancement in performances. Results obtained in this study have potential to open new opportunities in diagnosis of breast cancer.

[1]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Hakan Cevikalp,et al.  Large margin classifiers based on affine hulls , 2010, Neurocomputing.

[5]  B. John Oommen,et al.  On achieving semi-supervised pattern recognition by utilizing tree-based SOMs , 2013, Pattern Recognit..

[6]  Bo Yang,et al.  Data gravitation based classification , 2009, Inf. Sci..

[7]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[8]  Hussein A. Abbass,et al.  An evolutionary artificial neural networks approach for breast cancer diagnosis , 2002, Artif. Intell. Medicine.

[9]  Pei-Chann Chang,et al.  A CBR-based fuzzy decision tree approach for database classification , 2010, Expert Syst. Appl..

[10]  Jouni Lampinen,et al.  Differential evolution based nearest prototype classifier with optimized distance measures for the features in the data sets , 2013, Expert Syst. Appl..

[11]  A.A. Albrecht,et al.  Two applications of the LSA machine , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[12]  Seoung Bum Kim,et al.  Unsupervised feature selection using weighted principal components , 2011, Expert Syst. Appl..

[13]  Ludmil Mikhailov,et al.  An interpretable fuzzy rule-based classification methodology for medical diagnosis , 2009, Artif. Intell. Medicine.

[14]  Moshe Sipper,et al.  A fuzzy-genetic approach to breast cancer diagnosis , 1999, Artif. Intell. Medicine.

[15]  J. Swets ROC analysis applied to the evaluation of medical imaging techniques. , 1979, Investigative radiology.

[16]  Francisco Herrera,et al.  Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers , 2014, Pattern Recognit..

[17]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[18]  J W Cullen,et al.  Cancer prevention and control. , 1994, Seminars in oncology.

[19]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[20]  Kemal Polat,et al.  A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis , 2007, Comput. Biol. Medicine.

[21]  M.N.S. Swamy,et al.  Neural networks in a softcomputing framework , 2006 .

[22]  José Antonio Gómez-Ruiz,et al.  A combined neural network and decision trees model for prognosis of breast cancer relapse , 2003, Artif. Intell. Medicine.

[23]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[24]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[25]  Der-Chiang Li,et al.  A class possibility based kernel to increase classification accuracy for small data sets using support vector machines , 2010, Expert Syst. Appl..

[26]  Alan H. Fielding,et al.  Cluster and Classification Techniques for the Biosciences , 2006 .

[27]  Joel Quintanilla-Domínguez,et al.  WBCD breast cancer database classification applying artificial metaplasticity neural network , 2011, Expert Syst. Appl..

[28]  Ferenc Szeifert,et al.  Supervised fuzzy clustering for the identification of fuzzy classifiers , 2003, Pattern Recognit. Lett..

[29]  Ilias Maglogiannis,et al.  An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers , 2009, Applied Intelligence.

[30]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[31]  Mingtian Zhou,et al.  Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes , 2011, Expert Syst. Appl..

[32]  Aboul Ella Hassanien,et al.  Rough set approach for attribute reduction and rule generation: A case of patients with suspected breast cancer , 2004, J. Assoc. Inf. Sci. Technol..

[33]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[34]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[35]  Pei-Chann Chang,et al.  A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification , 2011, Appl. Soft Comput..

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Lijuan Liu,et al.  An Evolutionary Artificial Neural Network Approach for Breast Cancer Diagnosis , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[38]  Nicu Sebe,et al.  Machine Learning in Computer Vision , 2006, Computational Imaging and Vision.

[39]  Lois Boggess,et al.  ARTIFICIAL IMMUNE SYSTEM CLASSIFICATION OF MULTIPLE- CLASS PROBLEMS , 2002 .

[40]  N. Obuchowski Receiver operating characteristic curves and their use in radiology. , 2003, Radiology.

[41]  Dayou Liu,et al.  A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis , 2011, Expert Syst. Appl..

[42]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[44]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[45]  Kotagiri Ramamohanarao,et al.  Breast-Cancer identification using HMM-fuzzy approach , 2010, Comput. Biol. Medicine.

[46]  D B Fogel,et al.  Evolving neural networks for detecting breast cancer. , 1995, Cancer letters.

[47]  Gang Wang,et al.  Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy , 2014, Appl. Math. Comput..

[48]  Chee Seng Chan,et al.  A weighted inference engine based on interval-valued fuzzy relational theory , 2015, Expert Syst. Appl..

[49]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[50]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[51]  Xiaodong Liu,et al.  Novel artificial intelligent techniques via AFS theory: Feature selection, concept categorization and characteristic description , 2010, Appl. Soft Comput..

[52]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[53]  Ian Witten,et al.  Data Mining , 2000 .

[54]  Parham Moradi,et al.  An unsupervised feature selection algorithm based on ant colony optimization , 2014, Eng. Appl. Artif. Intell..

[55]  Ruxandra Stoean,et al.  Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection , 2013, Expert Syst. Appl..

[56]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[58]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[59]  Rudy Setiono,et al.  Generating concise and accurate classification rules for breast cancer diagnosis , 2000, Artif. Intell. Medicine.

[60]  Rezaul Begg,et al.  HMM-fuzzy model for breast cancer diagnosis , 2006 .