F-CAM: Full Resolution Class Activation Maps via Guided Parametric Upscaling

Class Activation Mapping (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks. They allow for CNN visualization and interpretation without training on fully annotated image datasets. CAM methods are typically integrated within off-theshelf CNN backbones, such as ResNet50. Due to convolution and pooling operations, these backbones yield low resolution CAMs with a down-scaling factor of up to 32, contributing to inaccurate localizations. Interpolation is required to restore full size CAMs, yet it does not consider the statistical properties of objects, such as color and texture, leading to activations with inconsistent boundaries, and inaccurate localizations. As an alternative, we introduce a generic method for parametric upscaling of CAMs that allows constructing accurate full resolution CAMs (F-CAMs). In particular, we propose a trainable decoding architecture that can be connected to any CNN classifier to produce highly accurate CAM localizations. Given an original low resolution CAM, foreground and background pixels are randomly sampled to fine-tune the decoder. Additional priors such as image statistics and size constraints are also considered to expand and refine object boundaries. Extensive experiments1, over three CNN backbones and six WSOL baselines on the CUB-200-2011 and OpenImages datasets, indicate that our F-CAM method yields a significant improvement in CAM localization accuracy. F-CAM performance is competitive with state-of-art WSOL methods, yet it requires fewer computations during inference.

[1]  Jianxin Wu,et al.  Rethinking the Route Towards Weakly Supervised Object Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jose Dolz,et al.  Deep Interpretable Classification and Weakly-Supervised Segmentation of Histology Images via Max-Min Uncertainty , 2020, IEEE Transactions on Medical Imaging.

[3]  Andreas Stafylopatis,et al.  High-Resolution Class Activation Mapping , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[4]  Ismail Ben Ayed,et al.  On Regularized Losses for Weakly-supervised CNN Segmentation , 2018, ECCV.

[5]  Marco Pedersoli,et al.  Convolutional STN for Weakly Supervised Object Localization , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[6]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[10]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bolei Zhou,et al.  TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization , 2021, ArXiv.

[12]  Andrew Adams,et al.  Fast High‐Dimensional Filtering Using the Permutohedral Lattice , 2010, Comput. Graph. Forum.

[13]  Mingjie Sun,et al.  Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach , 2019, AAAI.

[14]  Daniel Omeiza,et al.  Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models , 2019, ArXiv.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yunchao Wei,et al.  Inter-Image Communication for Weakly Supervised Localization , 2020, ECCV.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[21]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Youngjung Uh,et al.  In-sample Contrastive Learning and Consistent Attention for Weakly Supervised Object Localization , 2020, ACCV.

[23]  Chang Liu,et al.  DANet: Divergent Activation for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ming-Ming Cheng,et al.  LayerCAM: Exploring Hierarchical Class Activation Maps for Localization , 2021, IEEE Transactions on Image Processing.

[28]  Rakshit Naidu,et al.  SS-CAM: Smoothed Score-CAM for Sharper Visual Feature Localization , 2020, ArXiv.

[29]  Jae-Gil Lee,et al.  Learning from Noisy Labels with Deep Neural Networks: A Survey , 2020, ArXiv.

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  Changick Kim,et al.  Combinational Class Activation Maps for Weakly Supervised Object Localization , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Meng Yang,et al.  Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Xiaohu Dong,et al.  Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs , 2020, BMVC.

[35]  Sungroh Yoon,et al.  FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[37]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[40]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[41]  Ismail Ben Ayed,et al.  Deep Ordinal Classification with Inequality Constraints , 2019, ArXiv.

[42]  Shuguang Cui,et al.  Shallow Feature Matters for Weakly Supervised Object Localization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[44]  Ismail Ben Ayed,et al.  Deep Active Learning for Joint Classification & Segmentation with Weak Annotator , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Matthieu Cord,et al.  WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[48]  Seong Joon Oh,et al.  Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[50]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[51]  Ankita Ghosh,et al.  IS-CAM: Integrated Score-CAM for axiomatic-based explanations , 2020, ArXiv.

[52]  Zijian Zhang,et al.  Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[53]  Rodrigo Benenson,et al.  Large-Scale Interactive Object Segmentation With Human Annotators , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[55]  Qixiang Ye,et al.  Min-Entropy Latent Model for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.