CONVERGENCE OF GRADIENT METHOD WITH MOMENTUM FOR BACK-PROPAGATION NEURAL NETWORKS *

In this work, a gradient method with momentum for BP neural networks is considered. The momentum coefficient is chosen in an adaptive manner to accelerate and stabilize the learning procedure of the network weights. Corresponding convergence results are proved.

[1]  Wei Wu,et al.  Convergence of gradient method with momentum for two-Layer feedforward neural networks , 2006, IEEE Transactions on Neural Networks.

[2]  P. S. Sastry,et al.  Analysis of the back-propagation algorithm with momentum , 1994, IEEE Trans. Neural Networks.

[3]  Eugenius Kaszkurewicz,et al.  Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method , 2004, Neural Networks.

[4]  Manabu Torii,et al.  Stability of steepest descent with momentum for quadratic functions , 2002, IEEE Trans. Neural Networks.

[5]  Wei Wu,et al.  Strong Convergence of Gradient Methods for BP Networks Training , 2005, 2005 International Conference on Neural Networks and Brain.

[6]  Marco Gori,et al.  Optimal convergence of on-line backpropagation , 1996, IEEE Trans. Neural Networks.

[7]  Ya-Xiang Yuan,et al.  Optimization theory and methods , 2006 .

[8]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[9]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[10]  Stavros J. Perantonis,et al.  Two highly efficient second-order algorithms for training feedforward networks , 2002, IEEE Trans. Neural Networks.

[11]  William Finnoff,et al.  Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima , 1992, Neural Computation.

[12]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[13]  Terrence L. Fine,et al.  Parameter Convergence and Learning Curves for Neural Networks , 1999, Neural Computation.

[14]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[15]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .