论文信息 - On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant. We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations by exploiting image boundary effects. Because modern CNNs filters have a huge receptive field, these boundary effects operate even far from the image boundary, allowing the network to exploit absolute spatial location all over the image. We give a simple solution to remove spatial location encoding which improves translation invariance and thus gives a stronger visual inductive bias which particularly benefits small data sets. We broadly demonstrate these benefits on several architectures and various applications such as image classification, patch matching, and two video classification datasets.

Jan C. van Gemert | Osman Semih Kayhan | J. V. Gemert | O. Kayhan

[1] D. Griffith,et al. The boundary value problem in spatial statistical analysis. , 1983, Journal of regional science.

[2] Y. Meyer,et al. Wavelets and Filter Banks , 1991 .

[3] F. Aghdasi,et al. Reduction of boundary artifacts in image restoration , 1996, IEEE Trans. Image Process..

[4] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[6] Stanley J. Reeves,et al. Fast image restoration without boundary artifacts , 2005, IEEE Transactions on Image Processing.

[7] Matthew A. Brown,et al. Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[8] Jiaya Jia,et al. Reducing boundary artifacts in image deconvolution , 2008, 2008 15th IEEE International Conference on Image Processing.

[9] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10] Daniel A. Griffith,et al. An evaluation of correction techniques for boundary effects in spatial statistical analysis: traditional methods , 2010 .

[11] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[12] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[13] Jan C. van Gemert,et al. Exploiting photographic style for category-level image classification by generalizing the spatial pyramid , 2011, ICMR.

[14] Honglak Lee,et al. Learning Invariant Representations with Local Transformations , 2012, ICML.

[15] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[17] Stéphane Mallat,et al. Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[18] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[19] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Pedro M. Domingos,et al. Deep Symmetry Networks , 2014, NIPS.

[21] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[22] Qiang Chen,et al. Network In Network , 2013, ICLR.

[23] Joan Bruna,et al. Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[24] Jiaxing Zhang,et al. Scale-Invariant Convolutional Neural Networks , 2014, ArXiv.

[25] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[26] Trevor Darrell,et al. Do Convnets Learn Correspondence? , 2014, NIPS.

[27] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[29] Cícero Nogueira dos Santos,et al. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[30] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[31] Andrea Vedaldi,et al. Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[32] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Pascal Frossard,et al. Manitest: Are classifiers really invariant? , 2015, BMVC.

[34] Alán Aspuru-Guzik,et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[35] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[36] Rahul Sukthankar,et al. MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Nikos Komodakis,et al. Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Iasonas Kokkinos,et al. Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[39] Stéphane Mallat,et al. Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Victor S. Lempitsky,et al. Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43] Jun Zhao,et al. Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[44] Joel H. Saltz,et al. Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Joachim M. Buhmann,et al. TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Wilhelm Burger,et al. Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[48] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[49] Max Welling,et al. Group Equivariant Convolutional Networks , 2016, ICML.

[50] Arnold W. M. Smeulders,et al. Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Yann LeCun,et al. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[53] Koray Kavukcuoglu,et al. Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[54] Pascal Frossard,et al. Adaptive data augmentation for image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[55] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[56] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Stephan J. Garbin,et al. Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Qiang Qiu,et al. Oriented Response Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Xiuwen Liu,et al. A patch-based convolutional neural network for remote sensing image classification , 2017, Neural Networks.

[60] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[62] Mark Sandler,et al. Convolutional recurrent neural networks for music classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[63] Andrea Vedaldi,et al. Warped Convolutions: Efficient Invariance to Spatial Transformations , 2016, ICML.

[64] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[65] Juergen Gall,et al. A bag-of-words equivalent recurrent neural network for action recognition , 2017, Comput. Vis. Image Underst..

[66] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67] Matthew Richardson,et al. Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[68] Jiri Matas,et al. Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[69] Nikos Komodakis,et al. Rotation Equivariant Vector Field Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[70] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71] T. Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, Computer Vision and Pattern Recognition.

[72] Ting-Chun Wang,et al. Partial Convolution based Padding , 2018, ArXiv.

[73] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[74] Seyed-Mohsen Moosavi-Dezfooli,et al. Geometric Robustness of Deep Networks: Analysis and Improvement , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75] Jason Yosinski,et al. An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[76] Maurice Weiler,et al. Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77] Peter König,et al. Data augmentation instead of explicit regularization , 2018, ArXiv.

[78] Risi Kondor,et al. On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups , 2018, ICML.

[79] Eric Kauderer-Abrams,et al. Quantifying Translation-Invariance in Convolutional Neural Networks , 2017, ArXiv.

[80] Max Welling,et al. Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[81] Olga Veksler,et al. Location Augmentation for CNN , 2018, ArXiv.

[82] Min Sun,et al. Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[83] Devis Tuia,et al. Scale equivariance in CNNs with vector fields , 2018, ArXiv.

[84] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[85] Lei Zhou,et al. GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints , 2018, ECCV.

[86] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.

[87] Aleksander Madry,et al. Exploring the Landscape of Spatial Robustness , 2017, ICML.

[88] Ondrej Chum,et al. Explicit Spatial Encoding for Deep Local Descriptors , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[89] Nikos Komodakis,et al. Scattering Networks for Hybrid Representation Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90] Qiang Chen,et al. Location-aware Upsampling for Semantic Segmentation , 2019, ArXiv.

[91] MairalJulien,et al. Group invariance, stability to deformations, and complexity of deep convolutional representations , 2019 .

[92] Peer Neubert,et al. Circular Convolutional Neural Networks for Panoramic Images and Laser Data , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[93] Andre Araujo,et al. Computing Receptive Fields of Convolutional Neural Networks , 2019, Distill.

[94] Shiguang Shan,et al. Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation , 2019, ArXiv.

[95] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96] Niloy J. Mitra,et al. Learning on the Edge: Investigating Boundary Filters in CNNs , 2019, International Journal of Computer Vision.

[97] Ion Stoica,et al. Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules , 2019, ICML.

[98] Tao Shen,et al. FaceBagNet: Bag-Of-Local-Features Model for Multi-Modal Face Anti-Spoofing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[99] Matthias Bethge,et al. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[100] Richard Zhang,et al. Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[101] Daniel E. Worrall,et al. Deep Scale-spaces: Equivariance Over Scale , 2019, NeurIPS.

[102] Yair Weiss,et al. Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[103] Julien Mairal,et al. Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations , 2017, J. Mach. Learn. Res..

[104] Carsten Rother,et al. Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[105] Luc Van Gool,et al. Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[106] Alexander Lerchner,et al. Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs , 2019, ArXiv.

[107] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[108] A. Smeulders,et al. Scale-Equivariant Steerable Networks , 2019, ICLR.

[109] Sen Jia,et al. How Much Position Information Do Convolutional Neural Networks Encode? , 2020, ICLR.

[110] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).