Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance

This study investigates the effect of class imbalance in training data when developing neural network classifiers for computer-aided medical diagnosis. The investigation is performed in the presence of other characteristics that are typical among medical data, namely small training sample size, large number of features, and correlations between features. Two methods of neural network training are explored: classical backpropagation (BP) and particle swarm optimization (PSO) with clinically relevant training criteria. An experimental study is performed using simulated data and the conclusions are further validated on real clinical data for breast cancer diagnosis. The results show that classifier performance deteriorates with even modest class imbalance in the training data. Further, it is shown that BP is generally preferable over PSO for imbalanced training data especially with small data sample and large number of features. Finally, it is shown that there is no clear preference between oversampling and no compensation approach and some guidance is provided regarding a proper selection.

[1]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, and Classification Error of Nonparametric Linear Classification Algorithms , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[3]  N. Obuchowski Receiver operating characteristic curves and their use in radiology. , 2003, Radiology.

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  J. Baker,et al.  Breast mass lesions: computer-aided diagnosis models with mammographic and sonographic descriptors. , 2007, Radiology.

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[9]  Lilla Böröczky,et al.  Feature Subset Selection for Improving the Performance of False Positive Reduction in Lung Nodule CAD , 2005, IEEE Transactions on Information Technology in Biomedicine.

[10]  Jacek M. Zurada,et al.  Impact of Low Class Prevalence on the Performance Evaluation of Neural Network Based Classifiers: Experimental Study in the Context of Computer-Assisted Medical Diagnosis , 2007, 2007 International Joint Conference on Neural Networks.

[11]  E. Balas,et al.  Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success , 2005, BMJ : British Medical Journal.

[12]  Paulo J. G. Lisboa,et al.  The Use of Artificial Neural Networks in Decision Support in Cancer: a Systematic Review , 2005 .

[13]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[15]  Marcus A. Maloof,et al.  A General Model for Finite-Sample Effects in Training and Testing of Competing Classifiers , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[17]  Jacek M. Zurada,et al.  Particle swarm optimization of neural network CAD systems with clinically relevant objectives , 2007, SPIE Medical Imaging.

[18]  Andries Petrus Engelbrecht,et al.  A Cooperative approach to particle swarm optimization , 2004, IEEE Transactions on Evolutionary Computation.

[19]  Qiang Li,et al.  Comparison of typical evaluation methods for computer-aided diagnostic schemes: Monte Carlo simulation study. , 2007, Medical physics.

[20]  Berkman Sahiner,et al.  Finite-sample effects and resampling plans: applications to linear classifiers in computer-aided diagnosis , 1997, Medical Imaging.

[21]  Heang-Ping Chan,et al.  On the repeated use of databases for testing incremental improvement of computer-aided detection schemes. , 2004, Academic radiology.

[22]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[23]  David G. Stork,et al.  Pattern Classification , 1973 .

[24]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[25]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[26]  Stan Matwin,et al.  Evaluating Misclassifications in Imbalanced Data , 2006, ECML.

[27]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[28]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[29]  Berkman Sahiner,et al.  Sample size and validation issues on the development of CAD systems , 2004, CARS.

[30]  Paulo J. G. Lisboa,et al.  A review of evidence of health benefit from artificial neural networks in medical intervention , 2002, Neural Networks.

[31]  Qiang Li,et al.  Reduction of bias and variance for evaluation of computer-aided diagnostic schemes. , 2006, Medical physics.

[32]  Etienne Barnard,et al.  Backpropagation uses prior information efficiently , 1993, IEEE Trans. Neural Networks.

[33]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[34]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Lubomir M. Hadjiiski,et al.  Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size. , 2000, Medical physics.

[36]  Yoshihiko Hamamoto,et al.  On the Behavior of Artificial Neural Network Classifiers in High-Dimensional Spaces , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[38]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[39]  Keinosuke Fukunaga,et al.  Effects of Sample Size in Classifier Design , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[41]  Alan C. Bovik,et al.  Computer-Aided Detection and Diagnosis in Mammography , 2005 .

[42]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[43]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[44]  Jerry D. Gibson,et al.  Handbook of Image and Video Processing , 2000 .