Rescaling CNN through Learnable Repetition of Network Parameters

Deeper and wider CNNs are known to provide improved performance for deep learning tasks. However, most such networks have poor performance gain per parameter increase. In this paper, we investigate whether the gain observed in deeper models is purely due to the addition of more optimization parameters or whether the physical size of the network as well plays a role. Further, we present a novel rescaling strategy for CNNs based on learnable repetition of its parameters. Based on this strategy, we rescale CNNs without changing their parameter count, and show that learnable sharing of weights itself can provide significant boost in the performance of any given model without changing its parameter count. We show that small base networks when rescaled, can provide performance comparable to deeper networks with as low as 6% of optimization parameters of the deeper one. The relevance of weight sharing is further highlighted through the example of group-equivariant CNNs. We show that the significant improvements obtained with group-equivariant CNNs over the regular CNNs on classification problems are only partly due to the added equivariance property, and part of it comes from the learnable repetition of network weights. For rot-MNIST dataset, we show that up to 40% of the relative gain reported by state-of-the-art methods for rotation equivariance could actually be due to just the learnt repetition of weights.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Pierre-Marc Jodoin,et al.  Structured Pruning of Neural Networks With Budget-Aware Regularization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Deepak K. Gupta,et al.  ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations , 2021, ICLR.

[4]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[7]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[8]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[10]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  Kshitij Dwivedi,et al.  Deep Anchored Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[18]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[19]  Maurice Weiler,et al.  General E(2)-Equivariant Steerable CNNs , 2019, NeurIPS.

[20]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Alexandre Boulch ShaResNet: reducing residual network parameter number by sharing weights , 2017, ArXiv.

[24]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[25]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[26]  Training highly effective connectivities within neural networks with randomly initialized, fixed weights , 2020, ArXiv.

[27]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[28]  Maurice Weiler,et al.  Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.