Adding Non-Linear Context to Deep Networks

Enormous success has been achieved with deep neural networks consisting of standard linear-convolutions followed by simple non-linear mapping functions. In this paper, we add easily-computed non-linear local and global statistics to deep architectures, augmenting the information available at each layer. This additional information is then used in an identical manner to current processing. The summary statistics, which can be as simple as calculating within-channel variance, introduces little run-time computational overhead and can be instantiated with few extra parameters. All standard training procedures can be used without modification for training these augmented networks. We show, through extensive testing with ResNet on ImageNet, performance improvements across a wide range of network sizes. Additionally, we provide a detailed study of where within the deep networks these statistics are most effective.

[1]  Hichem Sahbi,et al.  Kernel-based Graph Convolutional Networks , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[2]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[3]  Hichem Sahbi,et al.  Deep Total Variation Support Vector Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[4]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[5]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[8]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[12]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.