Hexpo: A vanishing-proof activation function

This article proposes “Hexpo”, an activation function that has ability to scale the gradient and hence overcome the vanishing gradient problem. Unlike rectified linear units family which produces identity mapping on positive inputs, Hexpo has scalable limits on both positive and negative values. With parametrization, the active domain of Hexpo and the output it maps to is flexible. Thus it can alleviate the vanishing gradient problem from both gradient flow and local gradient aspect, while preserving the upper and lower bound of output. Parametrization also offers Hexpo the ability to produce the output that is close to zero. On experiments involving MNIST hand digit recognition dataset, Hexpo outperforms rectified linear units family(rectified linear unit and exponential linear unit) by the accuracy and speed of learning. On the experiments involving CIFAR-10 tiny image recognition dataset and convolutional layers, Hexpo outperforms rectified linear unit and performs similar in comparison with exponential linear unit.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[3]  Brahim Chaib-draa,et al.  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[4]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[8]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[9]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[11]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.