Parametric Exponential Linear Unit for Deep Convolutional Neural Networks

Object recognition is an important task for improving the ability of visual systems to perform complex scene understanding. Recently, the Exponential Linear Unit (ELU) has been proposed as a key component for managing bias shift in Convolutional Neural Networks (CNNs), but defines a parameter that must be set by hand. In this paper, we propose learning a parameterization of ELU in order to learn the proper activation shape at each layer in the CNNs. Our results on the MNIST, CIFAR-10/100 and ImageNet datasets using the NiN, Overfeat, All-CNN and ResNet networks indicate that our proposed Parametric ELU (PELU) has better performances than the non-parametric ELU. We have observed as much as a 7.28% relative error improvement on ImageNet with the NiN network, with only 0.0003% parameter increase. Our visual examination of the non-linear behaviors adopted by Vgg using PELU shows that the network took advantage of the added flexibility by learning different activations at different layers.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[4]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[5]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[6]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[12]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Pierre Baldi,et al.  Learning Activation Functions to Improve Deep Neural Networks , 2014, ICLR.

[15]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[16]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[18]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[19]  Razvan Pascanu,et al.  Natural Neural Networks , 2015, NIPS.

[20]  Anish Shah,et al.  Deep Residual Networks with Exponential Linear Unit , 2016, ArXiv.

[21]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[22]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yunchao Wei,et al.  Deep Learning with S-Shaped Rectified Linear Activation Units , 2015, AAAI.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.