CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions

Deep convolutional neural networks (CNNs) have revolutionized the computer vision research and have seen unprecedented adoption for multiple tasks, such as classification, detection, and caption generation. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this paper, we aim at alleviating this opaqueness of CNNs by providing visual explanations for the network’s predictions. Our approach can analyze a variety of CNN-based models trained for computer vision applications, such as object recognition and caption generation. Unlike the existing methods, we achieve this via unraveling the forward pass operation. The proposed method exploits feature dependencies across the layer hierarchy and uncovers the discriminative image locations that guide the network’s predictions. We name these locations CNN fixations, loosely analogous to human eye fixations. Our approach is a generic method that requires no architectural changes, additional training, or gradient computation, and computes the important image locations (CNN fixations). We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks, and data modalities.

[1]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[2]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Qian Yu,et al.  Sketch-aNet that Beats Humans , 2015 .

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[12]  Yan Wang,et al.  A Powerful Generative Model Using Random Weights for the Deep Image Representation , 2016, NIPS.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[17]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[18]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  Dragomir Anguelov,et al.  Self-taught object localization with deep networks , 2014, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  R. Venkatesh Babu,et al.  Fast Feature Fool: A data independent approach to universal adversarial perturbations , 2017, BMVC.

[24]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Deepu Rajan,et al.  Backtracking ScSPM Image Classifier for Weakly Supervised Top-Down Saliency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[30]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[31]  Bin Yang,et al.  CRAFT Objects from Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[33]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[34]  Ravi Kiran Sarvadevabhatla,et al.  Enabling My Robot To Play Pictionary: Recurrent Neural Networks For Sketch Recognition , 2016, ACM Multimedia.

[35]  Tao Xiang,et al.  Sketch-a-Net that Beats Humans , 2015, BMVC.

[36]  Cordelia Schmid,et al.  Accurate Object Localization with Shape Masks , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[39]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[40]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.