Learn-able parameter guided Activation Functions

In this paper, the concept of adding learn-able slope and mean shift parameters to an activation function to improve the total response region is explored. The characteristics of an activation function depend highly on the value of parameters. Making the parameters learn-able, makes the activation function more dynamic and capable to adapt as per the requirements of it’s neighboring layers. The introduced slope parameter is independent of other parameters in the activation function. The concept was applied to ReLU to develop Dual Line and Dual Parametric ReLU activation function. Evaluation on MNIST and CIFAR10 show that the proposed activation function Dual Line achieves top-5 position for mean accuracy among 43 activation functions tested with LENET4, LENET5 and WideResNet architectures. This is the first time more than 40 activation functions were analyzed on MNIST and CIFAR10 dataset at the same time. The study on the distribution of positive slope parameter \(\beta \) indicates that the activation function adapts as per the requirements of the neighboring layers. The study shows that model performance increases with the proposed activation functions.

[1]  Bolun Cai,et al.  FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[2]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[4]  Stephen Marshall,et al.  Activation Functions: Comparison of trends in Practice and Research for Deep Learning , 2018, ArXiv.

[5]  Brahim Chaib-draa,et al.  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[6]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[7]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[9]  Bolun Cai,et al.  Flexible Rectified Linear Units for Improving Convolutional Neural Networks , 2017 .

[10]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Brian Whitney,et al.  Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs) , 2017, ArXiv.

[12]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[13]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[14]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .

[15]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.