Fixing Localization Errors to Improve Image Classification

Deep neural networks are generally considered black-box models that offer less interpretability for their decision process. To address this limitation, Class Activation Map [46] (CAM) provides an attractive solution that visualizes class-specific discriminative regions in an input image. The remarkable ability of CAMs to locate class discriminating regions has been exploited in weakly-supervised segmentation and localization tasks. In this work, we explore a new direction towards the possible use of CAM in deep network learning process. We note that such visualizations lend insights into the workings of deep CNNs and could be leveraged to introduce additional constraints during the learning stage. Specifically, the CAMs for negative classes (negative CAMs) often have false activations even though those classes are absent from an image. Thereby, we propose a loss function that seeks to minimize peaks within the negative CAMs, called ‘Homogeneous Negative CAM ’ loss. This way, in an effort to fix localization errors, our loss provides an extra supervisory signal that helps the model to better discriminate between similar classes. Our designed loss function is easy to implement and can be readily integrated into existing DNNs. We evaluate it on a number of classification tasks including large-scale recognition, multilabel classification and fine-grained recognition. Our loss provides better performance compared to other loss functions across the studied tasks. Additionally, we show that the proposed loss function provides higher robustness against adversarial attacks and noisy labels.

[1]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Mubarak Shah,et al.  Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Seong-Gyun Jeong,et al.  Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  Seong Joon Oh,et al.  Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[10]  Suha Kwak,et al.  Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jiansheng Chen,et al.  Rethinking Feature Distribution for Loss Functions in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Nenghai Yu,et al.  Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Mohammed Bennamoun,et al.  A Guide to Convolutional Neural Networks for Computer Vision , 2018, A Guide to Convolutional Neural Networks for Computer Vision.

[14]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[15]  Ling Shao,et al.  Gaussian Affinity for Max-Margin Class Imbalanced Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[18]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[19]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[20]  Qiang Qiu,et al.  Weakly Supervised Instance Segmentation Using Class Peak Response , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[22]  Ramesh Raskar,et al.  Maximum-Entropy Fine-Grained Classification , 2018, NeurIPS.

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Wei Wei,et al.  Improving Adversarial Robustness via Guided Complement Entropy , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  F. Khan,et al.  Object Counting and Instance Segmentation With Image-Level Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luc Van Gool,et al.  Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation , 2020, ECCV.

[28]  Kun Yi,et al.  Probabilistic End-To-End Noise Correction for Learning With Noisy Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[30]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[31]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[32]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[34]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[35]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.

[36]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Kaiqi Huang,et al.  Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[38]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[39]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ling Shao,et al.  Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Shaogang Gong,et al.  Class Rectification Hard Mining for Imbalanced Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[45]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Hao Guo,et al.  Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[48]  Yi Zhu,et al.  Soft Proposal Networks for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).