Refine or Represent: Residual Networks with Explicit Channel-wise Configuration

The successes of deep residual learning are mainly based on one key insight: instead of learning a completely new representation y = H(x), it is much easier to learn and optimize its residual mapping F(x) = H(x) - x, as F(x) could be generally closer to zero than the non-residual function H(x). In this paper, we further exploit this insight by explicitly configuring each feature channel with a fine-grained learning style. We define two types of channel-wise learning styles: Refine and Represent. A Refine channel is learnt via the residual function yi = Fi(x) + xi with a regularization term on the channel response ||Fi(x)||, aiming to refine the input feature channel xi of the layer. A Represent channel directly learns a new representation yi = Hi(x) without calculating the residual function with reference to xi. We apply random channel-wise configuration to each residual learning block. Experimental results on the CIFAR10, CIFAR100 and ImageNet datasets demonstrate that our proposed method can substantially improve the performance of conventional residual networks including ResNet, ResNeXt and SENet.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[4]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[9]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[10]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[11]  Xavier Gastaldi,et al.  Shake-Shake regularization , 2017, ArXiv.

[12]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[13]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Timothy Doster,et al.  Gradual DropIn of Layers to Train Very Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[19]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[20]  Jürgen Schmidhuber,et al.  Binding via Reconstruction Clustering , 2015, ArXiv.

[21]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.