A Very Fast Learning Method for Neural Networks Based on Sensitivity Analysis

This paper introduces a learning method for two-layer feedforward neural networks based on sensitivity analysis, which uses a linear training algorithm for each of the two layers. First, random values are assigned to the outputs of the first layer; later, these initial values are updated based on sensitivity formulas, which use the weights in each of the layers; the process is repeated until convergence. Since these weights are learnt solving a linear system of equations, there is an important saving in computational time. The method also gives the local sensitivities of the least square errors with respect to input and output data, with no extra computational cost, because the necessary information becomes available without extra calculations. This method, called the Sensitivity-Based Linear Learning Method, can also be used to provide an initial set of weights, which significantly improves the behavior of other learning algorithms. The theoretical basis for the method is given and its performance is illustrated by its application to several examples in which it is compared with several learning algorithms and well known data sets. The results have shown a learning speed generally faster than other existing methods. In addition, it can be used as an initialization tool for other well known methods with significant improvements.

[1]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[2]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[3]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[4]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[5]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[6]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[7]  D. R. Hush,et al.  Improving the learning rate of back-propagation with the gradient reuse algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[8]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[9]  Yann LeCun,et al.  Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.

[10]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[11]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[12]  Michael K. Weir,et al.  A method for self-determination of adaptive learning rates in back propagation , 1991, Neural Networks.

[13]  Thomas P. Vogl,et al.  Rescaling of variables in back propagation learning , 1991, Neural Networks.

[14]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[15]  Sandro Ridella,et al.  Statistically controlled activation weight initialization (SCAWI) , 1992, IEEE Trans. Neural Networks.

[16]  Shawn D. Pethel,et al.  Characterization of optical instabilities and chaos using fast multilayer perceptron training algorithms , 1993, Optics & Photonics.

[17]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[18]  Frank Bärmann,et al.  A learning algorithm for multilayered neural networks based on linear least squares problems , 1993, Neural Networks.

[19]  F. Sorbello,et al.  Supervised learning for feed-forward neural networks: a new minimax approach for fast convergence , 1993, IEEE International Conference on Neural Networks.

[20]  Alessandro Sperduti,et al.  Speed up learning and network optimization with extended back propagation , 1993, Neural Networks.

[21]  Wray L. Buntine,et al.  Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[22]  Martin T. Hagan,et al.  Neural network design , 1995 .

[23]  Todd K. Leen,et al.  Using Curvature Information for Fast Stochastic Search , 1996, NIPS.

[24]  Peter R. Nelson,et al.  Multiple Comparisons: Theory and Methods , 1997 .

[25]  Enrique F. Castillo,et al.  Sensitivity analysis in discrete Bayesian networks , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[26]  Sandro Ridella,et al.  Circular backpropagation networks for classification , 1997, IEEE Trans. Neural Networks.

[27]  Johan A. K. Suykens,et al.  Nonlinear modeling : advanced black-box techniques , 1998 .

[28]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[29]  E. Castillo,et al.  Working with differential, functional and difference equations using functional networks , 1999 .

[30]  Dong-Jo Park,et al.  Acceleration of learning speed in neural networks by reducing weight oscillations , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[31]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[32]  E. Castillo,et al.  Functional Networks: A New Network‐Based Methodology , 2000 .

[33]  Enrique Castillo,et al.  Building and Solving Mathematical Programming Models in Engineering and Science , 2001 .

[34]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[35]  Okyay Kaynak,et al.  An algorithm for fast convergence in training neural networks , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[36]  Amparo Alonso-Betanzos,et al.  A Global Optimum Approach for One-Layer Neural Networks , 2002, Neural Computation.

[37]  Enrique Francisco Castillo Ron,et al.  Building and solving mathematical programming models in engineering and science , 2002 .

[38]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[39]  Amparo Alonso-Betanzos,et al.  Linear Least-Squares Based Methods for Neural Networks Learning , 2003, ICANN.

[40]  Samy Bengio,et al.  Scaling Large Learning Problems with Hard Parallel Mixtures , 2002, Int. J. Pattern Recognit. Artif. Intell..

[41]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[42]  Enrique F. Castillo,et al.  A General Method for Local Sensitivity Analysis With Application to Regression Models and Other Optimization Problems , 2004, Technometrics.

[43]  Duan Li,et al.  On Restart Procedures for the Conjugate Gradient Method , 2004, Numerical Algorithms.

[44]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[45]  A. Conejo,et al.  Perturbation Approach to Sensitivity Analysis in Mathematical Programming , 2006 .