Grid Saliency for Context Explanations of Semantic Segmentation

Recently, there has been a growing interest in developing saliency methods that provide visual explanations of network predictions. Still, the usability of existing methods is limited to image classification models. To overcome this limitation, we extend the existing approaches to generate grid saliencies, which provide spatially coherent visual explanations for (pixel-level) dense prediction networks. As the proposed grid saliency allows to spatially disentangle the object and its context, we specifically explore its potential to produce context explanations for semantic segmentation networks, discovering which context most influences the class predictions inside a target object area. We investigate the effectiveness of grid saliency on a synthetic dataset with an artificially induced bias between objects and their context as well as on the real-world Cityscapes dataset using state-of-the-art segmentation networks. Our results show that grid saliency can be successfully used to provide easily interpretable context explanations and, moreover, can be employed for detecting and localizing contextual biases present in the data.

[1]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[2]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[4]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[5]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Trevor Darrell,et al.  Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[8]  Mark Lee,et al.  On Physical Adversarial Patches for Object Detection , 2019, ArXiv.

[9]  Bo Zhang,et al.  Improving Interpretability of Deep Neural Networks with Semantic Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[14]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15]  Peter V. Gehler,et al.  Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[17]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[18]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[22]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Arnold W. M. Smeulders,et al.  Interpreting Adversarial Examples with Attributes , 2019, ArXiv.

[24]  Iasonas Kokkinos,et al.  Fast, Exact and Multi-scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs , 2016, ECCV.

[25]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[26]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[27]  Eric P. Xing,et al.  Contextual Explanation Networks , 2017, J. Mach. Learn. Res..

[28]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Roberto Cipolla,et al.  Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[30]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[31]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Tinne Tuytelaars,et al.  Modeling Visual Compatibility through Hierarchical Mid-level Elements , 2016, ArXiv.

[33]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[34]  Davide Mazzini,et al.  Guided Upsampling Network for Real-Time Semantic Segmentation , 2018, BMVC.

[35]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[36]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Arnold W. M. Smeulders,et al.  The Visual Extent of an Object , 2011, International Journal of Computer Vision.

[38]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Joost van de Weijer,et al.  Context Proposals for Saliency Detection , 2018, Comput. Vis. Image Underst..

[40]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[41]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[42]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[43]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.

[48]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[49]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[50]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.