Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method

Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.

[1]  Can Eyupoglu,et al.  An ensemble of neural networks for breast cancer diagnosis , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[2]  Reza Ebrahimpour,et al.  Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method. , 2014, Molecular bioSystems.

[3]  M. Lindquist,et al.  Antipsychotic drugs and heart muscle disorder in international pharmacovigilance: data mining study , 2001, BMJ : British Medical Journal.

[4]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[5]  A. Jemal,et al.  Breast Cancer Statistics , 2013 .

[6]  Lehana Thabane,et al.  Application of data mining techniques in pharmacovigilance. , 2003, British journal of clinical pharmacology.

[7]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[8]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[10]  Shuai Zhang,et al.  A novel ensemble method for credit scoring: Adaption of different imbalance ratios , 2018, Expert Syst. Appl..

[11]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[12]  Yan Xu,et al.  Deep learning of feature representation with multiple instance learning for medical image analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  David F. Lobach,et al.  Medical data mining: knowledge discovery in a clinical data warehouse , 1997, AMIA.

[14]  Zhenyu Wang,et al.  Design Ensemble Machine Learning Model for Breast Cancer Diagnosis , 2012, Journal of Medical Systems.

[15]  Walter C Willett,et al.  Comparison of abdominal adiposity and overall obesity in predicting risk of type 2 diabetes among men. , 2005, The American journal of clinical nutrition.

[16]  M. Cevdet Ince,et al.  An expert system for detection of breast cancer based on association rules and neural network , 2009, Expert Syst. Appl..

[17]  Hajar Mousannif,et al.  Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis , 2016, ANT/SEIT.

[18]  Timothy Baldwin,et al.  A Stacking-based Approach to Twitter User Geolocation Prediction , 2013, ACL.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[21]  Abid Sarwar,et al.  Hybrid ensemble learning technique for screening of cervical cancer using Papanicolaou smear image analysis , 2015 .

[22]  Moshe Sipper,et al.  A fuzzy-genetic approach to breast cancer diagnosis , 1999, Artif. Intell. Medicine.

[23]  Byoung-Tak Zhang,et al.  AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction , 2008, Expert Syst. Appl..

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  Alireza Osareh,et al.  Machine learning techniques to diagnose breast cancer , 2010, 2010 5th International Symposium on Health Informatics and Bioinformatics.

[26]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[27]  B. van Ginneken,et al.  Automated detection and differentiation of drusen, exudates, and cotton-wool spots in digital color fundus photographs for diabetic retinopathy diagnosis. , 2007, Investigative ophthalmology & visual science.

[28]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[29]  R. Haynes,et al.  Effects of Computer-based Clinical Decision Support Systems on Clinician Performance and Patient Outcome: A Critical Appraisal of Research , 1994, Annals of Internal Medicine.