Two Frameworks for Improving Gradient-Based Learning Algorithms

Backpropagation is the most popular algorithm for training neural networks. However, this gradient-based training method is known to have a tendency towards very long training times and convergence to local optima. Various methods have been proposed to alleviate these issues including, but not limited to, different training algorithms, automatic architecture design and different transfer functions. In this chapter we continue the exploration into improving gradient-based learning algorithms through dynamic transfer function modification. We propose opposite transfer functions as a means to improve the numerical conditioning of neural networks and extrapolate two backpropagation-based learning algorithms. Our experimental results show an improvement in accuracy and generalization ability on common benchmark functions. The experiments involve examining the sensitivity of the approach to learning parameters, type of transfer function and number of neurons in the network.

[1]  Yogesh Singh,et al.  An activation function adapting training algorithm for sigmoidal feedforward networks , 2004, Neurocomputing.

[2]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  Emilia I. Barakova,et al.  Symmetry: Between Indecision and Equality of Choice , 1997, IWANN.

[4]  C. Charalambous,et al.  Conjugate gradient algorithm for efficient training of artifi-cial neural networks , 1990 .

[5]  Yao Liang Adaptive neural activation functions in multiresolution learning , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[6]  Edmondo Trentin,et al.  Networks with trainable amplitude of activation functions , 2001, Neural Networks.

[7]  Mario Ventresca,et al.  Numerical condition of feedforward networks with opposite transfer functions , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[8]  Wlodzislaw Duch,et al.  Transfer functions: hidden possibilities for better neural networks , 2001, ESANN.

[9]  A.G. Thome,et al.  Dynamic adaptation of the error surface for the acceleration of the training of neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[10]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[11]  Danilo P. Mandic,et al.  A complex-valued nonlinear neural adaptive filter with a gradient adaptive amplitude of the activation function , 2003, Neural Networks.

[12]  Mario Ventresca,et al.  Opposite Transfer Functions and Backpropagation Through Time , 2007, 2007 IEEE Symposium on Foundations of Computational Intelligence.

[13]  Ming Zhang,et al.  Justification of a neuron-adaptive activation function , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[14]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[15]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[16]  Norbert Jankowski,et al.  NEW NEURAL TRANSFER FUNCTIONS , 2007 .

[17]  Eduardo D. Sontag,et al.  For neural networks, function determines form , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[18]  R. Hecht-Nielsen,et al.  On the geometry of feedforward neural network weight spaces , 1991 .

[19]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[20]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[21]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[22]  Yogesh Singh,et al.  A case for the self-adaptation of activation functions in FFANNs , 2004, Neurocomputing.

[23]  Mario Ventresca,et al.  Improving the Convergence of Backpropagation by Opposite Transfer Functions , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[24]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[25]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[26]  A. Reber Implicit learning of synthetic languages: The role of instructional set. , 1976 .

[27]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[28]  G. A. Hoffmann,et al.  Adaptive Transfer Functions in Radial Basis Function (RBF) Networks , 2004, International Conference on Computational Science.

[29]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[30]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[31]  Robert Hecht-Nielsen,et al.  On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.

[32]  Xin Yao,et al.  Evolutionary design of artificial neural networks with different nodes , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[33]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[34]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[35]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[36]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[37]  Joan Cabestany,et al.  Biological and Artificial Computation: From Neuroscience to Technology , 1997, Lecture Notes in Computer Science.

[38]  Yüksel Özbay,et al.  A New Neural Network with Adaptive Activation Function for Classification of ECG Arrhythmias , 2007, KES.

[39]  Stephen Gilmore,et al.  Evaluating the Performance of Skeleton-Based High Level Parallel Programs , 2004, International Conference on Computational Science.

[40]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[41]  Wei-Der Chang,et al.  A feedforward neural network with function shape autotuning , 1996, Neural Networks.

[42]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[43]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[44]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[45]  Francesco Piazza,et al.  Learning and Approximation Capabilities of Adaptive Spline Activation Function Neural Networks , 1998, Neural Networks.

[46]  Manolis Papadrakakis,et al.  Learning improvement of neural networks used in structural optimization , 2004 .

[47]  Norbert Jankowski,et al.  Survey of Neural Transfer Functions , 1999 .

[48]  George D. Magoulas,et al.  Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods , 1999, Neural Computation.

[49]  Sandro Ridella,et al.  An Adaptive Momentum Back Propagation (AMBP) , 1995, Neural Computing & Applications.

[50]  Leon S. Lasdon,et al.  Path relinking and GRG for artificial neural networks , 2006, Eur. J. Oper. Res..

[51]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[52]  Ming Zhang,et al.  Adaptive higher-order feedforward neural networks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[53]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[54]  Wlodzislaw Duch,et al.  Optimal transfer function neural networks , 2001, ESANN.

[55]  Danilo P. Mandic,et al.  Recurrent neural networks with trainable amplitude of activation functions , 2003, Neural Networks.

[56]  George Cybenko,et al.  Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..