GPOL: Gradient and Probabilistic approach for Object Localization to understand the working of CNNs

Convolutional neural networks have been a revolution in the field of Computer Vision and are being extensively used for the purpose of image classification, object detection, generation of captions etc. CNNs are mostly considered black boxes where the internal functioning is not known. The objective of this work is to provide an explanation of the functioning of the the predictions made by the CNN. We propose a new technique for comprehending the functioning of the middle layers of the neural network and the classifier operations. The proposed approach is capable of analyzing multifarious models which are trained for applications such as object detection and recognition. In this work, probabilistic approach and gradient based approach have been used for the purpose of object localization. Geometric mean of heatmaps of both the approaches has been done. In the former approach, the true object’s gradient’s are made to flow into the last convolutional layer for the purpose of determining the most significant points which would help to predict that particular object. In the probabilistic approach, CNN’s top down attention has been used which serves the purpose of generation of attention maps which are task specific. A probabilistic scheme (to select a significant neuron in the network) has been used during backpropagation of signals from top to down in the hierarchy of network. The proposed work has been executed on CLS-LOC dataset which is a part of Imagenet dataset. The proposed work is then compared with the previously developed techniques such as saliency maps, SmoothGrad, GradCam, Top Down Neural approach to exhibit the better accuracy of the proposed work.

[1]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[5]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Vice President,et al.  An Introduction to Expert Systems , 1989 .

[8]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[9]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[10]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[14]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Konda Reddy Mopuri,et al.  CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions , 2019, IEEE Transactions on Image Processing.

[17]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.

[18]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[20]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).