Elastic exponential linear units for convolutional neural networks

Abstract Activation functions play important roles in determining the depth and non-linearity of deep learning models. Since the Rectified Linear Unit (ReLU) was introduced, many modifications, in which noise is intentionally injected, have been proposed to avoid overfitting. Exponential Linear Unit (ELU) and their variants, with trainable parameters, have been proposed to reduce the bias shift effects which are often observed in ReLU-type activation functions. In this paper, we propose a novel activation function, called the Elastic Exponential Linear Unit (EELU), which combines the advantages of both types of activation functions in a generalized form. EELU has an elastic slope in the positive part, and preserves the negative signal by using a small non-zero gradient. We also present a new strategy to insert neuronal noise using a Gaussian distribution in the activation function to improve generalization. We demonstrated how EELU can represent a wider variety of features with random noise than other activation functions, by visualizing the latent features of convolutional neural networks. We evaluated the effectiveness of the EELU approach through extensive experiments with image classification using the CIFAR-10/CIFAR-100, ImageNet, and Tiny ImageNet datasets. Our experimental results show that EELU achieved better generalization performance and improved classification accuracy over conventional activation functions, such as ReLU, ELU, ReLU- and ELU-like variants, Scaled ELU, and Swish. EELU produced performance improvements in image classification using a smaller number of training samples, owing to its noise injection strategy, which allows significant variation in function outputs, including deactivation.

[1]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xuelong Li,et al.  Randomly translational activation inspired by the input distributions of ReLU , 2018, Neurocomputing.

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yunchao Wei,et al.  Deep Learning with S-Shaped Rectified Linear Activation Units , 2015, AAAI.

[7]  Xuelong Li,et al.  Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry , 2015, IEEE Transactions on Image Processing.

[8]  D. Ferster,et al.  The contribution of noise to contrast invariance of orientation tuning in cat visual cortex. , 2000, Science.

[9]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[10]  Xuelong Li,et al.  Learning Sampling Distributions for Efficient Object Detection , 2015, IEEE Transactions on Cybernetics.

[11]  J. Hounsgaard,et al.  Voltage fluctuations in neurons: signal or noise? , 2011, Physiological reviews.

[12]  Raunak Dey,et al.  Diagnostic classification of lung nodules using 3D neural networks , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[13]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[16]  Qiong Wu,et al.  Improving Deep Neural Network with Multiple Parametric Exponential Linear Units , 2016, Neurocomputing.

[17]  Honglak Lee,et al.  Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.

[18]  Brahim Chaib-draa,et al.  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[21]  Xiang Li,et al.  Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Kurt Wiesenfeld,et al.  Stochastic resonance and the benefits of noise: from ice ages to crayfish and SQUIDs , 1995, Nature.

[23]  Xuelong Li,et al.  Deep neural networks with Elastic Rectified Linear Units for object recognition , 2018, Neurocomputing.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[29]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[30]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[32]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.