On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location

In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant. We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations by exploiting image boundary effects. Because modern CNNs filters have a huge receptive field, these boundary effects operate even far from the image boundary, allowing the network to exploit absolute spatial location all over the image. We give a simple solution to remove spatial location encoding which improves translation invariance and thus gives a stronger visual inductive bias which particularly benefits small data sets. We broadly demonstrate these benefits on several architectures and various applications such as image classification, patch matching, and two video classification datasets.

[1]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Sen Jia,et al.  How Much Position Information Do Convolutional Neural Networks Encode? , 2020, ICLR.

[3]  A. Smeulders,et al.  Scale-Equivariant Steerable Networks , 2019, ICLR.

[4]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[5]  Qiang Chen,et al.  Location-aware Upsampling for Semantic Segmentation , 2019, ArXiv.

[6]  Andre Araujo,et al.  Computing Receptive Fields of Convolutional Neural Networks , 2019, Distill.

[7]  Niloy J. Mitra,et al.  Learning on the Edge: Investigating Boundary Filters in CNNs , 2019, International Journal of Computer Vision.

[8]  Shiguang Shan,et al.  Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation , 2019, ArXiv.

[9]  Tao Shen,et al.  FaceBagNet: Bag-Of-Local-Features Model for Multi-Modal Face Anti-Spoofing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luc Van Gool,et al.  Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Peer Neubert,et al.  Circular Convolutional Neural Networks for Panoramic Images and Laser Data , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[13]  Daniel E. Worrall,et al.  Deep Scale-spaces: Equivariance Over Scale , 2019, NeurIPS.

[14]  Ion Stoica,et al.  Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules , 2019, ICML.

[15]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[16]  Ondrej Chum,et al.  Explicit Spatial Encoding for Deep Local Descriptors , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[18]  Alexander Lerchner,et al.  Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs , 2019, ArXiv.

[19]  Nikos Komodakis,et al.  Scattering Networks for Hybrid Representation Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[21]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[23]  Julien Mairal,et al.  Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations , 2017, J. Mach. Learn. Res..

[24]  MairalJulien,et al.  Group invariance, stability to deformations, and complexity of deep convolutional representations , 2019 .

[25]  Ting-Chun Wang,et al.  Partial Convolution based Padding , 2018, ArXiv.

[26]  Devis Tuia,et al.  Scale equivariance in CNNs with vector fields , 2018, ArXiv.

[27]  Olga Veksler,et al.  Location Augmentation for CNN , 2018, ArXiv.

[28]  Lei Zhou,et al.  GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints , 2018, ECCV.

[29]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[30]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[31]  Min Sun,et al.  Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Peter König,et al.  Data augmentation instead of explicit regularization , 2018, ArXiv.

[33]  Risi Kondor,et al.  On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups , 2018, ICML.

[34]  Eric Kauderer-Abrams,et al.  Quantifying Translation-Invariance in Convolutional Neural Networks , 2017, ArXiv.

[35]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[36]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Geometric Robustness of Deep Networks: Analysis and Improvement , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Maurice Weiler,et al.  Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[40]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Xiuwen Liu,et al.  A patch-based convolutional neural network for remote sensing image classification , 2017, Neural Networks.

[42]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[43]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Juergen Gall,et al.  A bag-of-words equivalent recurrent neural network for action recognition , 2017, Comput. Vis. Image Underst..

[45]  Qiang Qiu,et al.  Oriented Response Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Nikos Komodakis,et al.  Rotation Equivariant Vector Field Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Andrea Vedaldi,et al.  Warped Convolutions: Efficient Invariance to Spatial Transformations , 2016, ICML.

[51]  Mark Sandler,et al.  Convolutional recurrent neural networks for music classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[54]  Matthew Richardson,et al.  Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[55]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[56]  Pascal Frossard,et al.  Adaptive data augmentation for image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[57]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Joachim M. Buhmann,et al.  TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[61]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[62]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[64]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[65]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Joel H. Saltz,et al.  Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[68]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[69]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[70]  Pascal Frossard,et al.  Manitest: Are classifiers really invariant? , 2015, BMVC.

[71]  Iasonas Kokkinos,et al.  Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[72]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[75]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[77]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[79]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[80]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[82]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[83]  Pedro M. Domingos,et al.  Deep Symmetry Networks , 2014, NIPS.

[84]  Jiaxing Zhang,et al.  Scale-Invariant Convolutional Neural Networks , 2014, ArXiv.

[85]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[86]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[87]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[88]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[89]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[90]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[91]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[92]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[93]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[95]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[96]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[97]  Honglak Lee,et al.  Learning Invariant Representations with Local Transformations , 2012, ICML.

[98]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[99]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[100]  Jan C. van Gemert,et al.  Exploiting photographic style for category-level image classification by generalizing the spatial pyramid , 2011, ICMR.

[101]  Daniel A. Griffith,et al.  An evaluation of correction techniques for boundary effects in spatial statistical analysis: traditional methods , 2010 .

[102]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[103]  Jiaya Jia,et al.  Reducing boundary artifacts in image deconvolution , 2008, 2008 15th IEEE International Conference on Image Processing.

[104]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[105]  Stanley J. Reeves,et al.  Fast image restoration without boundary artifacts , 2005, IEEE Transactions on Image Processing.

[106]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[107]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[108]  F. Aghdasi,et al.  Reduction of boundary artifacts in image restoration , 1996, IEEE Trans. Image Process..

[109]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[110]  D. Griffith,et al.  The boundary value problem in spatial statistical analysis. , 1983, Journal of regional science.