A novel activation function for multilayer feed-forward neural networks

Traditional activation functions such as hyperbolic tangent and logistic sigmoid have seen frequent use historically in artificial neural networks. However, nowadays, in practice, they have fallen out of favor, undoubtedly due to the gap in performance observed in recognition and classification tasks when compared to their well-known counterparts such as rectified linear or maxout. In this paper, we introduce a simple, new type of activation function for multilayer feed-forward architectures. Unlike other approaches where new activation functions have been designed by discarding many of the mainstays of traditional activation function design, our proposed function relies on them and therefore shares most of the properties found in traditional activation functions. Nevertheless, our activation function differs from traditional activation functions on two major points: its asymptote and global extremum. Defining a function which enjoys the property of having a global maximum and minimum, turned out to be critical during our design-process since we believe it is one of the main reasons behind the gap observed in performance between traditional activation functions and their recently introduced counterparts. We evaluate the effectiveness of the proposed activation function on four commonly used datasets, namely, MNIST, CIFAR-10, CIFAR-100, and the Pang and Lee’s movie review. Experimental results demonstrate that the proposed function can effectively be applied across various datasets where our accuracy, given the same network topology, is competitive with the state-of-the-art. In particular, the proposed activation function outperforms the state-of-the-art methods on the MNIST dataset.

[1]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[2]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[3]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[4]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[5]  Benjamin Graham,et al.  Spatially-sparse convolutional neural networks , 2014, ArXiv.

[6]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[8]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  D. Mandic,et al.  Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models , 2009 .

[10]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[11]  A. V. Olgac,et al.  Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural Networks , 2011 .

[12]  David G. Stork,et al.  Pattern Classification , 1973 .

[13]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[18]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[24]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[25]  Danilo P. Mandic,et al.  Complex Valued Nonlinear Adaptive Filters , 2009 .

[26]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[27]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[28]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[29]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[30]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[31]  Nitish Srivastava,et al.  Improving Neural Networks with Dropout , 2013 .