Squeeze-and-Excitation Networks

Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, several recent approaches have shown the benefit of enhancing spatial encoding. In this work, we focus on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ~25% relative improvement over the winning entry of 2016. Code and models are available at https://github.com/hujie-frank/SENet.

[1]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[2]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[5]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[6]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[8]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[11]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[12]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[13]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[17]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[18]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[20]  Jürgen Schmidhuber,et al.  Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[21]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[23]  Wei Xu,et al.  Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Qingming Huang,et al.  Multi-Level Discriminative Dictionary Learning With Application to Large Scale Image Classification , 2015, IEEE Transactions on Image Processing.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[29]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[30]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[35]  Jakob Verbeek,et al.  Convolutional Neural Fabrics , 2016, NIPS.

[36]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[37]  Qingming Huang,et al.  Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[38]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Théodore Bluche,et al.  Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition , 2016, NIPS.

[41]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[42]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[43]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[44]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ivan Laptev,et al.  Learnable pooling with Context Gating for video classification , 2017, ArXiv.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[49]  Xavier Gastaldi,et al.  Shake-Shake regularization , 2017, ArXiv.

[50]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[55]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[56]  Geoffrey J. Gordon,et al.  DeepArchitect: Automatically Designing and Training Deep Architectures , 2017, ArXiv.

[57]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Dahua Lin,et al.  PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[61]  Joon Son Chung,et al.  Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[63]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[64]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[65]  Roberto Cipolla,et al.  Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[68]  In-So Kweon,et al.  BAM: Bottleneck Attention Module , 2018, BMVC.

[69]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[70]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[71]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[72]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[73]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[74]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.

[75]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[77]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[78]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[79]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[81]  Frank Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[82]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[83]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.