How Much Position Information Do Convolutional Neural Networks Encode?

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. Information concerning absolute position is inherently useful, and it is reasonable to assume that deep CNNs may implicitly learn to encode this information if there is a means to do so. In this paper, we test this hypothesis revealing the surprising degree of absolute position information that is encoded in commonly used neural networks. A comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Sen Jia,et al.  Richer and Deeper Supervision Network for Salient Object Detection , 2019, ArXiv.

[3]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[4]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yee Whye Teh,et al.  Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation , 2006, Cogn. Sci..

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[11]  Ming-Hsuan Yang,et al.  PiCANet: Learning Pixel-Wise Contextual Attention for Saliency Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Sen Jia,et al.  EML-NET: An Expandable Multi-Layer NETwork for Saliency Prediction , 2018, Image Vis. Comput..

[14]  Junwei Han,et al.  PiCANet: Learning Pixel-wise Contextual Attention in ConvNets and Its Application in Saliency Detection , 2017, ArXiv.

[15]  Neil D. B. Bruce,et al.  Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Yoshua Bengio,et al.  ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.

[17]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Huchuan Lu,et al.  Learning to Detect Salient Objects with Image-Level Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[21]  Hongjing Lu,et al.  Deep convolutional networks do not classify based on global object shape , 2018, PLoS Comput. Biol..

[22]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[23]  Yang Wang,et al.  Gated Feedback Refinement Network for Dense Image Labeling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[25]  Huchuan Lu,et al.  Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[27]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[29]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.