GaterNet: Dynamic Filter Selection in Convolutional Neural Network via a Dedicated Global Gating Network

The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing. In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs). The problem is interesting because the idea of forcing different parts of the model to learn from different types of samples may help us acquire better filters in CNNs, improve the model generalization performance and potentially increase the interpretability of model behavior. We propose a novel yet simple framework called GaterNet, which involves a backbone and a gater network. The backbone network is a regular CNN that performs the major computation needed for making a prediction, while a global gater network is introduced to generate binary gates for selectively activating filters in the backbone network based on each input. Extensive experiments on CIFAR and ImageNet datasets show that our models consistently outperform the original models with a large margin. On CIFAR-10, our model also improves upon state-of-the-art results.

[1]  Itamar Arel,et al.  Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks , 2013, ICLR.

[2]  Noam Shazeer,et al.  HydraNets: Specialized Dynamic Architectures for Efficient Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  S. Schultz Principles of Neural Science, 4th ed. , 2001 .

[4]  Xavier Gastaldi,et al.  Shake-Shake regularization , 2017, ArXiv.

[5]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Serge Belongie,et al.  Convolutional Networks with Adaptive Inference Graphs , 2019, International Journal of Computer Vision.

[7]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[8]  Jürgen Schmidhuber,et al.  Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[11]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[12]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Samy Bengio,et al.  Can Active Memory Replace Attention? , 2016, NIPS.

[15]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[18]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[19]  Xiaopeng Li,et al.  Learning Sparse Deep Feedforward Networks via Tree Skeleton Expansion , 2018, ArXiv.

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[22]  Ludovic Denoyer,et al.  Deep Sequential Neural Network , 2014, NIPS 2014.

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24]  Joelle Pineau,et al.  Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[25]  Samy Bengio,et al.  Discrete Autoencoders for Sequence Models , 2018, ArXiv.

[26]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[27]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[30]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.