Dilated convolution with learnable spacings

Dilated convolution is basically a convolution with a wider kernel created by regularly inserting spaces between the kernel elements. In this article, we present a new version of the dilated convolution in which the spacings are made learnable via backpropagation through an interpolation technique. We call this method “Dilated Convolution with Learnable Spacings” (DCLS) and we generalize its approach to the ndimensional convolution case. However, our main focus here will be the 2D case for which we developed two implementations: a naive one that constructs the dilated kernel, suitable for small dilation rates, and a more time/memory efficient one that uses a modified version of the “im2col” algorithm. We then illustrate how this technique improves the accuracy of existing architectures on semantic segmentation task on Pascal Voc 2012 dataset via a simple drop-in replacement of the classical dilated convolutional layers by DCLS ones. Furthermore, we show that DCLS allows to reduce the number of learnable parameters of the depthwise convolutions used in the recent ConvMixer architecture by a factor 3 with no or very low reduction in accuracy and that by replacing large dense kernels with sparse DCLS ones. The code of the method is based on Pytorch and available at: https: //github.com/K-H-Ismail/Dilated-Convol ution-with-Learnable-Spacings-PyTorch.

[1]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Andre Araujo,et al.  Computing Receptive Fields of Convolutional Neural Networks , 2019, Distill.

[6]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[7]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[8]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[9]  J. Zico Kolter,et al.  Patches Are All You Need? , 2022, Trans. Mach. Learn. Res..

[10]  Fuseini Mumuni,et al.  CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review , 2021, SN Computer Science.

[11]  Richard Kronland-Martinet,et al.  A real-time algorithm for signal analysis with the help of the wavelet transform , 1989 .

[12]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[17]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Mark J. Shensa,et al.  The discrete wavelet transform: wedding the a trous and Mallat algorithms , 1992, IEEE Trans. Signal Process..