Bi-modal derivative activation function for sigmoidal feedforward networks

A new class of activation functions is proposed as the sum of shifted log-sigmoid activation functions. This has the effect of making the derivative of the activation function with respect to the net inputs, be bi-modal. That is, the derivative of the activation functions has two maxima of equal values for nonzero values of the parameter, that parametrises the proposed class of activation functions. On a set of ten function approximation tasks, the usage of the proposed activation function demonstrates that there exists network(s), using the proposed activation, and are able to achieve lower generalisation error, in equal epochs of training using the resilient backpropagation algorithm. On a set of four benchmark problems taken from UCI machine learning repository, for which the networks are trained using the resilient backpropagation algorithm, the scaled conjugate algorithm, the Levenberg-Marquardt algorithm and the quasi-Newton BFGS algorithm, we observe that the usage of the proposed algorithms leads to better generalisation results, similar to the results for the ten function approximation tasks wherein the networks were trained using the resilient backpropagation algorithm.

[1]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[2]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[3]  Dansheng Yu,et al.  Approximation by neural networks with sigmoidal functions , 2013 .

[4]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[5]  L. Jones Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.

[6]  Athanasios Tsanas,et al.  Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools , 2012 .

[7]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[8]  Jenq-Neng Hwang,et al.  Projection pursuit learning networks for regression , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[9]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[10]  R. Tibshirani,et al.  The II P method for estimating multivariate functions from noisy data , 1991 .

[11]  Vladimir Cherkassky,et al.  Comparison of adaptive methods for function estimation from samples , 1996, IEEE Trans. Neural Networks.

[12]  Danilo P. Mandic,et al.  Recurrent neural networks with trainable amplitude of activation functions , 2003, Neural Networks.

[13]  Zongben Xu,et al.  New study on neural networks: the essential order of approximation , 2010, Neural Networks.

[14]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[15]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[16]  Martin A. Riedmiller,et al.  Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .

[17]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[18]  Francesco Gianfelici Review of Random signals and systems by B. Pincibono Englewood Cliffs, NJ, Prentice-Hall, 2003 , 2009 .

[19]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[20]  Marcello Sanguineti,et al.  Universal Approximation by Ridge Computational Models and Neural Networks: A Survey ⁄ , 2008 .

[21]  Edward K. Blum,et al.  Approximation theory and feedforward networks , 1991, Neural Networks.

[22]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[23]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[24]  Yogesh Singh,et al.  Feedforward sigmoidal networks - equicontinuity and fault-tolerance properties , 2004, IEEE Transactions on Neural Networks.

[25]  Philip E. Gill,et al.  Practical optimization , 1981 .

[26]  Pravin Chandra,et al.  Sigmoidal Function Classes for Feedforward Artificial Neural Networks , 2003, Neural Processing Letters.

[27]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[28]  S. M. Carroll,et al.  Construction of neural nets using the radon transform , 1989, International 1989 Joint Conference on Neural Networks.

[29]  Norbert Jankowski,et al.  Survey of Neural Transfer Functions , 1999 .

[30]  Yogesh Singh,et al.  A case for the self-adaptation of activation functions in FFANNs , 2004, Neurocomputing.

[31]  George D. Magoulas,et al.  New globally convergent training scheme based on the resilient propagation algorithm , 2005, Neurocomputing.

[32]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[33]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[34]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[35]  C. Micchelli,et al.  Degree of Approximation by Neural and Translation Networks with a Single Hidden Layer , 1995 .

[36]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[37]  Edmondo Trentin,et al.  Networks with trainable amplitude of activation functions , 2001, Neural Networks.

[38]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[39]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[40]  C. Micchelli,et al.  Approximation by superposition of sigmoidal and radial basis functions , 1992 .