Improving Model Accuracy for Imbalanced Image Classification Tasks by Adding a Final Batch Normalization Layer: An Empirical Study

Some real-world domains, such as Agriculture and Healthcare, comprise early-stage disease indications whose recording constitutes a rare event, and yet, whose precise detection at that stage is critical. In this type of highly imbalanced classification problems, which encompass complex features, deep learning (DL) is much needed because of its strong detection capabilities. At the same time, DL is observed in practice to favor majority over minority classes and consequently suffer from inaccurate detection of the targeted early-stage indications. To simulate such scenarios, we artificially generate skewness (99% vs. 1%) for certain plant types out of the PlantVillage dataset as a basis for classification of scarce visual cues through transfer learning. By randomly and unevenly picking healthy and unhealthy samples from certain plant types to form a training set, we consider a base experiment as fine-tuning ResNet34 and VGG19 architectures and then testing the model performance on a balanced dataset of healthy and unhealthy images. We empirically observe that the initial F1 test score jumps from 0.29 to 0.95 for the minority class upon adding a final Batch Normalization (BN) layer just before the output layer in VGG19. We demonstrate that utilizing an additional BN layer before the output layer in modern CNN architectures has a considerable impact in terms of minimizing the training time and testing error for minority classes in highly imbalanced data sets. Moreover, when the final BN is employed, minimizing the loss function may not be the best way to assure a high F1 test score for minority classes in such problems. That is, the network might perform better even if it is not confident enough while making a prediction; leading to another discussion about why softmax output is not a good uncertainty measure for DL models.

[1]  Qiuyu Zhu,et al.  Improving Classification Performance of Softmax Loss Function Based on Scalable Batch-Normalization , 2020, Applied Sciences.

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[4]  Thorsten Hoeser,et al.  Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends , 2020, Remote. Sens..

[5]  Jun Yan,et al.  Batch Normalization: Is Learning An Adaptive Gain and Bias Necessary? , 2018, ICMLC.

[6]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[7]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[8]  D. Prüfer,et al.  Evolution and Management of the Irish Potato Famine Pathogen Phytophthora Infestans in Canada and the United States , 2014, American Journal of Potato Research.

[9]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[10]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[11]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  David J. Schwab,et al.  Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs , 2020, ICLR.

[14]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[15]  Anne-Katrin Mahlein,et al.  Recent advances in sensing plant diseases for precision crop protection , 2012, European Journal of Plant Pathology.

[16]  Joachim Denzler,et al.  Deep Learning on Small Datasets without Pre-Training using Cosine Loss , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[19]  Yimin D. Zhang,et al.  Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[20]  Shankar Krishnan,et al.  Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[23]  Noel C. F. Codella,et al.  Skin lesion analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC) , 2016, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[24]  Bernd Bischl,et al.  Robust Anomaly Detection in Images using Adversarial Autoencoders , 2019, ECML/PKDD.

[25]  Julio Martin Duarte-Carvajalino,et al.  Evaluating Late Blight Severity in Potato Crops Using Unmanned Aerial Vehicles and Machine Learning Algorithms , 2018, Remote. Sens..

[26]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[27]  Carla P. Gomes,et al.  Understanding Batch Normalization , 2018, NeurIPS.

[28]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[29]  Marcel Salathé,et al.  Using Deep Learning for Image-Based Plant Disease Detection , 2016, Front. Plant Sci..

[30]  Cian O'Donnell,et al.  Adaptive Estimators Show Information Compression in Deep Neural Networks , 2019, ICLR.

[31]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[32]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[33]  Jordan J. Bird,et al.  A Study on CNN Transfer Learning for Image Classification , 2018, UKCI.

[34]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[35]  Marcel Salathé,et al.  An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing , 2015, ArXiv.