论文信息 - A novel activation function for multilayer feed-forward neural networks

A novel activation function for multilayer feed-forward neural networks

Traditional activation functions such as hyperbolic tangent and logistic sigmoid have seen frequent use historically in artificial neural networks. However, nowadays, in practice, they have fallen out of favor, undoubtedly due to the gap in performance observed in recognition and classification tasks when compared to their well-known counterparts such as rectified linear or maxout. In this paper, we introduce a simple, new type of activation function for multilayer feed-forward architectures. Unlike other approaches where new activation functions have been designed by discarding many of the mainstays of traditional activation function design, our proposed function relies on them and therefore shares most of the properties found in traditional activation functions. Nevertheless, our activation function differs from traditional activation functions on two major points: its asymptote and global extremum. Defining a function which enjoys the property of having a global maximum and minimum, turned out to be critical during our design-process since we believe it is one of the main reasons behind the gap observed in performance between traditional activation functions and their recently introduced counterparts. We evaluate the effectiveness of the proposed activation function on four commonly used datasets, namely, MNIST, CIFAR-10, CIFAR-100, and the Pang and Lee’s movie review. Experimental results demonstrate that the proposed function can effectively be applied across various datasets where our accuracy, given the same network topology, is competitive with the state-of-the-art. In particular, the proposed activation function outperforms the state-of-the-art methods on the MNIST dataset.

Huan Zhao | Aboubakar Nasser Samatin Njikam | Huan Zhao

[1] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[2] Jürgen Schmidhuber,et al. Compete to Compute , 2013, NIPS.

[3] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[4] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[5] Benjamin Graham,et al. Spatially-sparse convolutional neural networks , 2014, ArXiv.

[6] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[8] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] D. Mandic,et al. Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models , 2009 .

[10] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[11] A. V. Olgac,et al. Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural Networks , 2011 .