FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

When designing Convolutional Neural Networks (CNNs), one must select the size of the convolutional kernels before training. Recent works show CNNs benefit from different kernel sizes at different layers, but exploring all possible combinations is unfeasible in practice. A more efficient approach is to learn the kernel size during training. However, existing works that learn the kernel size have a limited bandwidth. These approaches scale kernels by dilation, and thus the detail they can describe is limited. In this work, we propose FlexConv, a novel convolutional operation with which high bandwidth convolutional kernels of learnable kernel size can be learned at a fixed parameter cost. FlexNets model long-term dependencies without the use of pooling, achieve state-of-the-art performance on several sequential datasets, outperform recent works with learned kernel sizes, and are competitive with much deeper ResNets on image benchmark datasets. Additionally, FlexNets can be deployed at higher resolutions than those seen during training. To avoid aliasing, we propose a novel kernel parameterization with which the frequency of the kernels can be analytically controlled. Our novel kernel parameterization shows higher descriptive power and faster convergence speed than existing parameterizations. This leads to important improvements in classification accuracy.

[1]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andrew Gordon Wilson,et al.  Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data , 2020, ICML.

[3]  Xiaogang Wang,et al.  From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Tony Lindeberg,et al.  Scale-covariant and scale-invariant Gaussian derivative networks , 2020, SSVM.

[5]  Chris Eliasmith,et al.  Parallelizing Legendre Memory Unit Training , 2021, ICML.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas S. Huang,et al.  Dilated Recurrent Neural Networks , 2017, NIPS.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Jaakko Lehtinen,et al.  Alias-Free Generative Adversarial Networks , 2021, NeurIPS.

[11]  Nicolas Le Roux,et al.  Impact of Aliasing on Generalization in Deep Convolutional Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[13]  Anit Kumar Sahu,et al.  Multiplicative Filter Networks , 2021, ICLR.

[14]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[15]  Ales Leonardis,et al.  Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks , 2019, International Journal of Computer Vision.

[16]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[17]  Saeid Nahavandi,et al.  SpinalNet: Deep Neural Network with Gradual Input , 2020, ArXiv.

[18]  Omri Azencot,et al.  Lipschitz Recurrent Neural Networks , 2020, ICLR.

[19]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[20]  Marcello Chiaberge,et al.  Efficient-CapsNet: capsule network with self-attention routing , 2021, Scientific reports.

[21]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[22]  Yuan Yuan,et al.  Variational Context-Deformable ConvNets for Indoor Scene Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nikos Komodakis,et al.  Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[25]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[26]  Siddhartha Mishra,et al.  Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies , 2020, ICLR.

[27]  Raquel Urtasun,et al.  Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Mark Hoogendoorn,et al.  CKConv: Continuous Kernel Convolution For Sequential Data , 2021, ArXiv.

[29]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Li Li,et al.  Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[31]  Trevor Darrell,et al.  Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields , 2019, ArXiv.

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Siddhartha Mishra,et al.  UnICORNN: A recurrent model for learning very long time dependencies , 2021, ICML.

[34]  Matthew W. Hoffman,et al.  Improving the Gating Mechanism of Recurrent Neural Networks , 2019, ICML.

[35]  Terry Lyons,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[36]  C. Ré,et al.  HiPPO: Recurrent Memory with Optimal Polynomial Projections , 2020, NeurIPS.

[37]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[39]  Nergis Tomen,et al.  Deep Continuous Networks , 2024, ICML.

[40]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[41]  Xiaofeng Wang,et al.  Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Michael Flynn,et al.  The UEA multivariate time series classification archive, 2018 , 2018, ArXiv.

[43]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[44]  Ed H. Chi,et al.  AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[45]  Marco Loog,et al.  Resolution Learning in Deep Convolutional Networks Using Scale-Space Theory , 2021, IEEE Transactions on Image Processing.

[46]  Quoc V. Le,et al.  Learning Longer-term Dependencies in RNNs with Auxiliary Losses , 2018, ICML.

[47]  Vladlen Koltun,et al.  Trellis Networks for Sequence Modeling , 2018, ICLR.

[48]  Martin Jaggi,et al.  On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.

[49]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[50]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[53]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54]  J. V. Gemert,et al.  On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Wanling Gao,et al.  Extended Batch Normalization , 2020, ArXiv.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[60]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[61]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).