Training Better CNNs Requires to Rethink ReLU

With the rapid development of Deep Convolutional Neural Networks (DCNNs), numerous works focus on designing better network architectures (i.e., AlexNet, VGG, Inception, ResNet and DenseNet etc.). Nevertheless, all these networks have the same characteristic: each convolutional layer is followed by an activation layer, a Rectified Linear Unit (ReLU) layer is the most used among them. In this work, we argue that the paired module with 1:1 convolution and ReLU ratio is not the best choice since it may result in poor generalization ability. Thus, we try to investigate the more suitable convolution and ReLU ratio for exploring the better network architectures. Specifically, inspired by Leaky ReLU, we focus on adopting the proportional module with N:M (N$>$M) convolution and ReLU ratio to design the better networks. From the perspective of ensemble learning, Leaky ReLU can be considered as an ensemble of networks with different convolution and ReLU ratio. We find that the proportional module with N:M (N$>$M) convolution and ReLU ratio can help networks acquire the better performance, through the analysis of a simple Leaky ReLU model. By utilizing the proportional module with N:M (N$>$M) convolution and ReLU ratio, many popular networks can form more rich representations in models, since the N:M (N$>$M) proportional module can utilize information more effectively. Furthermore, we apply this module in diverse DCNN models to explore whether is the N:M (N$>$M) convolution and ReLU ratio indeed more effective. From our experimental results, we can find that such a simple yet effective method achieves better performance in different benchmarks with various network architectures and the experimental results verify that the superiority of the proportional module.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[4]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[11]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[16]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[18]  Jingdong Wang,et al.  Interleaved Group Convolutions for Deep Neural Networks , 2017, ArXiv.

[19]  Honglak Lee,et al.  Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.

[20]  Zhuowen Tu,et al.  On the Connection of Deep Fusion to Ensembling , 2016, ArXiv.

[21]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[22]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[23]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[24]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.