Discovering Internal Representations from Object-CNNs Using Population Encoding

In this paper, we provide a method for understanding the internal representations of Convolutional Neural Networks (CNNs) trained on objects. We hypothesize that the information is distributed across multiple neuronal responses and propose a simple clustering technique to extract this information, which we call \emph{population encoding}. The population encoding technique looks into the entrails of an object-CNN at multiple layers of the network and shows the implicit presence of mid-level object part semantics distributed in the neuronal responses. Our qualitative visualizations show that population encoding can extract mid-level image patches that are visually tighter than the patches that produce high single-filter activations. Moreover, our comprehensive quantitative experiments using the object key point annotations from the PASCAL3D+ dataset corroborate the visualizations by demonstrating the superiority of population encoding over single-filter detectors, in the task of object-part detection. We also perform some preliminary experiments where we uncover the compositional relations between the adjacent layers using the parts detected by population encoding clusters. Finally, based on the insights gained from this work, we point to various new directions which will enable us to have a better understanding of the CNN's internal representations.

[1]  H B Barlow,et al.  Single units and sensation: a neuron doctrine for perceptual psychology? , 1972, Perception.

[2]  A. P. Georgopoulos,et al.  Neuronal population coding of movement direction. , 1986, Science.

[3]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[4]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[5]  James L. McClelland Parallel Distributed Processing , 2005 .

[6]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[7]  H. Barlow,et al.  Single Units and Sensation: A Neuron Doctrine for Perceptual Psychology? , 1972, Perception.

[8]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[9]  Antonio Torralba,et al.  Part and appearance sharing: Recursive Compositional Models for multi-view , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[11]  Long Zhu,et al.  Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[15]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[17]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[19]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[20]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[21]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[23]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[24]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[25]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[26]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[27]  Saining Xie,et al.  Holistically-Nested Edge Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[33]  Yao Li,et al.  Mid-level deep pattern mining , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .