A cost-sensitive ensemble classifier for breast cancer classification

Breast cancer is the most commonly diagnosed form of cancer in women. Pattern classification approaches often have difficulties with breast cancer related datasets as the available training data are typically imbalanced with many more benign cases recorded than malignant ones, leading to a bias in the classification and insufficient sensitivity. In this paper, we present an ensemble classification algorithm that addresses this problem by employing cost-sensitive decision trees as base classifiers which are trained on random feature subspaces to ensure diversity, and an evolutionary algorithm for simultaneous classifier selection and fusion. Experimental results on two different breast cancer datasets confirm our approach to work well and to provide boosted sensitivity compared to various other state-of-the-art ensembles.

[1]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[2]  U. G. Dailey Cancer,Facts and Figures about. , 2022, Journal of the National Medical Association.

[3]  Chronic Disease Division Cancer facts and figures , 2010 .

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  B. Krawczyk,et al.  Improving minority class prediction using cost-sensitive ensembles , 2011 .

[7]  Gerald Schaefer,et al.  Thermography based breast cancer analysis using statistical features and fuzzy classification , 2009, Pattern Recognit..

[8]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  Hisao Ishibuchi,et al.  A cost-based fuzzy system for pattern classification with class importance , 2007, Artificial Life and Robotics.

[11]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[12]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[13]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[14]  Michal Wozniak,et al.  Designing combining classifier with trained fuser — Analytical and experimental evaluation , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[15]  Szymon Wilk,et al.  Integrating Selective Pre-processing of Imbalanced Data with Ivotes Ensemble , 2010, RSCTC.

[16]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[17]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[18]  Marlon Núñez The use of background knowledge in decision tree induction , 2004, Machine Learning.

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Francisco Herrera,et al.  Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics , 2012, Expert Syst. Appl..

[21]  Marlon Núñez,et al.  The Use of Background Knowledge in Decision Tree Induction , 1991, Machine Learning.

[22]  Gerald Schaefer,et al.  Combining one-class classifiers for imbalanced classification of breast thermogram features , 2013, 2013 Fourth International Workshop on Computational Intelligence in Medical Imaging (CIMI).