Linearized sigmoidal activation: A novel activation function with tractable non-linear characteristics to boost representation capability

Abstract Activation functions play a crucial role in discriminative capabilities of the deep neural networks. They are also one of the main reasons for revival of the neural networks. Although, recent activation functions provide solution to vanishing and exploding gradient problems, but they do not have sufficient capacity to model non-linear data. This paper proposes a novel activation function, which imparts neural networks with capability to model non-linear dependencies in data. Behavior of proposed activation function remains non-saturating even with non-linear structure. The activation function provides distinct activation behaviors to different data range segments. The paper presents two different variants of proposed function. The first, “linear sigmoidal activation” function is a fixed structure activation function with function coefficients defined at the start of model design. Whereas second, “adaptive linear sigmoidal activation” function is a trainable function which can adapt itself according to the complexity of the given data. Both of the proposed models are tested against the state of the art activation functions on benchmark datasets (CIFAR-10, MNIST, SVHN, FER-2013). The proposed activation function is able to outperforms every known activation functions in all the tests.

[1]  Wenjie Lu,et al.  Regional deep learning model for visual tracking , 2016, Neurocomputing.

[2]  Xuelong Li,et al.  Speed up deep neural network based pedestrian detection by sharing features across multi-scale models , 2016, Neurocomputing.

[3]  Qingxiang Wu,et al.  Image super-resolution using a dilated convolutional neural network , 2018, Neurocomputing.

[4]  Lin Xiao,et al.  Finite-time solution to nonlinear equation using recurrent neural dynamics with a specially-constructed activation function , 2015, Neurocomputing.

[5]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[6]  Yoshua Bengio,et al.  Challenges in Representation Learning: A Report on Three Machine Learning Contests , 2013, ICONIP.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Gianluca Vinti,et al.  Convergence for a family of neural network operators in Orlicz spaces , 2017 .

[9]  Gianluca Vinti,et al.  Pointwise and uniform approximation by multivariate neural network operators of the max-product type , 2016, Neural Networks.

[10]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[11]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[12]  Lin Xiao,et al.  A nonlinearly activated neural dynamics and its finite-time solution to time-varying nonlinear equation , 2016, Neurocomputing.

[13]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[15]  Ömer Faruk Ertugrul,et al.  A novel type of activation function in artificial neural networks: Trained activation function , 2018, Neural Networks.

[16]  Hong Peng,et al.  A joint residual network with paired ReLUs activation for image super-resolution , 2018, Neurocomputing.

[17]  Yimin Zhou,et al.  Photograph aesthetical evaluation and classification with deep convolutional neural networks , 2017, Neurocomputing.

[18]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.