Convergence of Cyclic and Almost-Cyclic Learning With Momentum for Feedforward Neural Networks

Two backpropagation algorithms with momentum for feedforward neural networks with a single hidden layer are considered. It is assumed that the training samples are supplied to the network in a cyclic or an almost-cyclic fashion in the learning procedure, i.e., in each training cycle, each sample of the training set is supplied in a fixed or a stochastic order respectively to the network exactly once. A restart strategy for the momentum is adopted such that the momentum coefficient is set to zero at the beginning of each training cycle. Corresponding weak and strong convergence results are then proved, indicating that the gradient of the error function goes to zero and the weight sequence goes to a fixed point, respectively. The convergence conditions on the learning rate, the momentum coefficient, and the activation functions are much relaxed compared with those of the existing results.

[1]  Tom Heskes,et al.  A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning , 1996, IEEE Trans. Neural Networks.

[2]  Wei Wu,et al.  Convergence of gradient method with momentum for two-Layer feedforward neural networks , 2006, IEEE Transactions on Neural Networks.

[3]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[4]  Shixin Cheng,et al.  Dynamic learning rate optimization of the backpropagation algorithm , 1995, IEEE Trans. Neural Networks.

[5]  Martin T. Hagan,et al.  Neural network design , 1995 .

[6]  Terrence L. Fine,et al.  Parameter Convergence and Learning Curves for Neural Networks , 1999, Neural Computation.

[7]  Xin Li,et al.  Training Multilayer Perceptrons Via Minimization of Sum of Ridge Functions , 2002, Adv. Comput. Math..

[8]  Wei Wu,et al.  Deterministic convergence of an online gradient method for BP neural networks , 2005, IEEE Transactions on Neural Networks.

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[11]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[12]  James Ting-Ho Lo Convexification for data fitting , 2010, J. Glob. Optim..

[13]  Naimin Zhang An online gradient method with momentum for two-layer feedforward neural networks , 2009, Appl. Math. Comput..

[14]  Eugenius Kaszkurewicz,et al.  Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method , 2004, Neural Networks.

[15]  Mauro Forti,et al.  Generalized neural network for nonsmooth nonlinear programming problems , 2004, IEEE Transactions on Circuits and Systems I: Regular Papers.

[16]  William Finnoff,et al.  Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima , 1992, Neural Computation.

[17]  Manabu Torii,et al.  Stability of steepest descent with momentum for quadratic functions , 2002, IEEE Trans. Neural Networks.

[18]  Naimin Zhang Deterministic Convergence of an Online Gradient Method with Momentum , 2006, ICIC.

[19]  Wu,et al.  CONVERGENCE OF GRADIENT METHOD WITH MOMENTUM FOR BACK-PROPAGATION NEURAL NETWORKS * , 2008 .

[20]  Nikhil R. Pal,et al.  A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning , 2003, IEEE Trans. Neural Networks.

[21]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[22]  Wei Wu,et al.  Strong Convergence of Gradient Methods for BP Networks Training , 2005, 2005 International Conference on Neural Networks and Brain.

[23]  Evaldo Araújo de Oliveira,et al.  Performance of the Bayesian Online Algorithm for the Perceptron , 2007, IEEE Transactions on Neural Networks.

[24]  Zongben Xu,et al.  When Does Online BP Training Converge? , 2009, IEEE Transactions on Neural Networks.

[25]  Wei Wu,et al.  Convergence of an online gradient method for feedforward neural networks with stochastic inputs , 2004 .

[26]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[27]  Wei Wu,et al.  Deterministic convergence of an online gradient method for neural networks , 2002 .

[28]  Sagar V. Kamarthi,et al.  Accelerating neural network training using weight extrapolations , 1999, Neural Networks.

[29]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[30]  Gao Wei-dong Prediction of Stock Market by BP Neural Networks with Technical Indexes as Input , 2003 .

[31]  Yanchun Liang,et al.  Successive approximation training algorithm for feedforward neural networks , 2002, Neurocomputing.

[32]  Xiao-Hu Yu,et al.  Efficient Backpropagation Learning Using Optimal Learning Rate and Momentum , 1997, Neural Networks.

[33]  P. S. Sastry,et al.  Analysis of the back-propagation algorithm with momentum , 1994, IEEE Trans. Neural Networks.

[34]  Nii O. Attoh-Okine,et al.  Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance , 1999 .

[35]  M. J. D. Powell,et al.  Restart procedures for the conjugate gradient method , 1977, Math. Program..

[36]  S. Stankovic,et al.  Learning in neural networks by normalized stochastic gradient algorithm: local convergence , 2000, Proceedings of the 5th Seminar on Neural Network Applications in Electrical Engineering. NEUREL 2000 (IEEE Cat. No.00EX287).

[37]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[38]  Takéhiko Nakama,et al.  Theoretical analysis of batch and on-line training for gradient descent learning in neural networks , 2009, Neurocomputing.