A systematic study of the class imbalance problem in convolutional neural networks

In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that completely eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance; (iv) as opposed to some classical machine learning models, oversampling does not cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Ryutaro Tateishi,et al.  A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees , 2013 .

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[6]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[7]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[11]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[12]  Claire Cardie,et al.  Improving Minority Class Prediction Using Case-Specific Feature Weights , 1997, ICML.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Yiyu Yao,et al.  Brain activation detection by neighborhood one-class SVM , 2007, Cognitive Systems Research.

[15]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[16]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[17]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[18]  Jack Koplowitz,et al.  On the relation of performance to editing in nearest neighbor rules , 1981, Pattern Recognit..

[19]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[20]  Pei-Yi Hao,et al.  Fuzzy one-class support vector machines , 2008, Fuzzy Sets Syst..

[21]  Qingming Huang,et al.  Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[22]  Longbing Cao,et al.  Training deep neural networks on imbalanced data sets , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[23]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[24]  H. Robbins A Stochastic Approximation Method , 1951 .

[25]  Antoine Geissbühler,et al.  Learning from imbalanced data in surveillance of nosocomial infection , 2006, Artif. Intell. Medicine.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Robert C. Holte,et al.  Cost-sensitive classifier evaluation , 2005, UBDM '05.

[30]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[31]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[32]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[33]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[34]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[35]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[36]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[37]  Hong Gu,et al.  Anomaly detection combining one-class SVMs and particle swarm optimization algorithms , 2010 .

[38]  Ah Chung Tsoi,et al.  Neural Network Classification and Prior Class Probabilities , 1996, Neural Networks: Tricks of the Trade.

[39]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[40]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[41]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[42]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[43]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[44]  Hoon Sohn,et al.  NOVELTY DETECTION USING AUTO-ASSOCIATIVE NEURAL NETWORK , 2001 .

[45]  Andrew Janowczyk,et al.  Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases , 2016, Journal of pathology informatics.

[46]  Jiawei Han,et al.  Text classification from positive and unlabeled documents , 2003, CIKM '03.

[47]  Hyoungjoo Lee,et al.  The Novelty Detection Approach for Different Degrees of Class Imbalance , 2006, ICONIP.

[48]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[49]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[50]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[51]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[52]  Hsuan-Tien Lin,et al.  Cost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning , 2015, IJCAI.

[53]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[54]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[55]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[56]  David M. J. Tax,et al.  One-class classification , 2001 .

[57]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[58]  Tal Hassner,et al.  Age and gender classification using convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[59]  J Hilden,et al.  Regret graphs, diagnostic uncertainty and Youden's Index. , 1996, Statistics in medicine.

[60]  Wei Xu,et al.  Improving one-class SVM for anomaly detection , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[61]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[62]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[63]  Niall M. Adams,et al.  Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[64]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[65]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[66]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[67]  José Salvador Sánchez,et al.  Restricted Decontamination for the Imbalanced Training Sample Problem , 2003, CIARP.

[68]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[69]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[70]  Antônio de Pádua Braga,et al.  Artificial Neural Networks Learning in ROC Space , 2009, IJCCI.

[71]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[72]  R. A. Mollineda,et al.  The class imbalance problem in pattern classification and learning , 2009 .

[73]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[74]  Nitesh V. Chawla,et al.  Classification and knowledge discovery in protein databases , 2004, J. Biomed. Informatics.

[75]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[76]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[77]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[78]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[79]  David J. Kriegman,et al.  Automated annotation of coral reef survey images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Shehroz S. Khan,et al.  One-class classification: taxonomy of study and review of techniques , 2013, The Knowledge Engineering Review.

[81]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[82]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[83]  Filiberto Pla,et al.  Editing Prototypes in the Finite Sample Size Case Using Alternative Neighbourhoods , 1998, SSPR/SPR.

[84]  Jerzy W. Grzymala-Busse,et al.  An Approach to Imbalanced Data Sets Based on Changing Rule Strength , 2004, Rough-Neural Computing: Techniques for Computing with Words.

[85]  Malik Yousef,et al.  A comparison study between one-class and two-class machine learning for MicroRNA target detection , 2010 .

[86]  Yang Liu,et al.  One-Class Support Vector Machine Calibration Using Particle Swarm Optimisation , 2007 .

[87]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[88]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[89]  YoungSik Choi,et al.  Video Summarization Using Fuzzy One-Class Support Vector Machine , 2004, ICCSA.

[90]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[91]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[92]  Stefan Wermter,et al.  Towards Effective Classification of Imbalanced Data with Convolutional Neural Networks , 2016, ANNPR.

[93]  Padraig Cunningham,et al.  The problem of bias in training data in regression problems in medical decision support , 2002, Artif. Intell. Medicine.

[94]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[95]  M. Arozullah,et al.  Data compression and novelty filtering in retinotopic backpropagation networks , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[96]  Nathalie Japkowicz,et al.  Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[97]  Rémi Gilleron,et al.  Positive and Unlabeled Examples Help Learning , 1999, ALT.

[98]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[99]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[100]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[101]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Yang Song,et al.  The iNaturalist Challenge 2017 Dataset , 2017, ArXiv.

[103]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[104]  Lewis D. Griffin,et al.  Detection of concealed cars in complex cargo X-ray imagery using deep learning , 2016, Journal of X-ray science and technology.

[105]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[106]  Christopher Joseph Pal,et al.  Brain tumor segmentation with Deep Neural Networks , 2015, Medical Image Anal..

[107]  Caroline Petitjean,et al.  A Random Forest Based Approach for One Class Classification in Medical Imaging , 2012, MLMI.

[108]  Robert P. W. Duin,et al.  Data domain description using support vectors , 1999, ESANN.

[109]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[110]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[111]  Kun-Huang Chen,et al.  A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients , 2014, Appl. Soft Comput..

[112]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[113]  Vipin Kumar,et al.  Evaluating boosting algorithms to classify rare classes: comparison and improvements , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[114]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[115]  Steven J. Mullen,et al.  Multiclass ROC Analysis , 2009 .

[116]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[117]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[118]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[119]  Joachim Denzler,et al.  ImageNet pre-trained models with batch normalization , 2016, ArXiv.