Understanding CNN Fragility When Learning With Imbalanced Data

Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a black-box. To demystify CNN decisions on imbalanced data, we focus on their latent features. Although CNNs embed the pattern knowledge learned from a training set in model parameters, the effect of this knowledge is contained in feature and classification embeddings ( FE and CE ). These embeddings can be extracted from a trained model and their global, class properties (e.g., frequency, magnitude and identity) can be ana-lyzed. We find that important information regarding the ability of a neural network to generalize to minority classes resides in the class top-K CE and FE . We show that a CNN learns a limited number of class top-K CE per category, and that their number and magnitudes vary based on whether the same class is balanced or imbalanced. This calls into question whether a CNN has learned intrinsic class features, or merely frequently occurring ones that happen to exist in the sam-pled class distribution. We also hypothesize that latent class diversity is as important as the number of class examples, which has important implications for re-sampling and cost-sensitive methods. These methods generally focus on rebalancing model weights, class numbers and mar-gins; instead of diversifying class latent features through augmentation. We also demonstrate that a CNN has difficulty generalizing to test data if the magnitude of its top-K latent features do not match the training set. We use three popular image datasets and two cost-sensitive algorithms commonly employed in imbalanced learning for our experiments.

[1]  B. Krawczyk,et al.  Efficient Augmentation for Imbalanced Deep Learning , 2022, 2023 IEEE 39th International Conference on Data Engineering (ICDE).

[2]  S. Lapuschkin,et al.  From "Where" to "What": Towards Human-Understandable Explanations through Concept Relevance Propagation , 2022, ArXiv.

[3]  J. Hopcroft,et al.  Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power , 2022, NeurIPS.

[4]  Sotiris Kotsiantis,et al.  Explainable AI: A Review of Machine Learning Interpretability Methods , 2020, Entropy.

[5]  Vineet Padmanabhan,et al.  Identifying Class Specific Filters with L1 Norm Frequency Histograms in Deep CNNs , 2021, ArXiv.

[6]  Nathalie Japkowicz,et al.  ReMix: Calibrated Resampling for Class Imbalance in Deep learning , 2020, 2012.02312.

[7]  Felix Assion,et al.  Understanding the Decision Boundary of Deep Neural Networks: An Empirical Study , 2020, ArXiv.

[8]  De-Chuan Zhan,et al.  Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning , 2020, ArXiv.

[9]  Xiu-Shen Wei,et al.  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Junmo Kim,et al.  Adjusting Decision Boundary for Class Imbalanced Learning , 2019, IEEE Access.

[11]  Saining Xie,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2019, ICLR.

[12]  Mukund Sundararajan,et al.  The many Shapley values for model explanation , 2019, ICML.

[13]  Eric P. Xing,et al.  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Chen Huang,et al.  Deep Imbalanced Learning for Face Recognition and Attribute Prediction , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jiliang Tang,et al.  Characterizing the Decision Boundary of Deep Neural Networks , 2019, ArXiv.

[17]  György Kovács,et al.  An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets , 2019, Appl. Soft Comput..

[18]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[19]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[20]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[21]  David Gunning,et al.  DARPA's explainable artificial intelligence (XAI) program , 2019, IUI.

[22]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[23]  Taghi M. Khoshgoftaar,et al.  An Empirical Study on Class Rarity in Big Data , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[24]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[25]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[26]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[27]  Charles K. Chui,et al.  Deep Nets for Local Manifold Learning , 2016, Front. Appl. Math. Stat..

[28]  Nathalie Japkowicz,et al.  Manifold-based synthetic oversampling with manifold conformance estimation , 2018, Machine Learning.

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[32]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Dapeng Oliver Wu,et al.  Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[35]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[39]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[40]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[41]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[43]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[44]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[45]  Gary M. Weiss,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[46]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[47]  L. Shapley A Value for n-person Games , 1988 .