When Does Online BP Training Converge?

The backpropogation (BP) neural networks have been widely applied in scientific research and engineering. The success of the application, however, relies upon the convergence of the training procedure involved in the neural network learning. We settle down the convergence analysis issue through proving two fundamental theorems on the convergence of the online BP training procedure. One theorem claims that under mild conditions, the gradient sequence of the error function will converge to zero (the weak convergence), and another theorem concludes the convergence of the weight sequence defined by the procedure to a fixed value at which the error function attains its minimum (the strong convergence). The weak convergence theorem sharpens and generalizes the existing convergence analysis conducted before, while the strong convergence theorem provides new analysis results on convergence of the online BP training procedure. The results obtained reveal that with any analytic sigmoid activation function, the online BP training procedure is always convergent, which then underlies successful application of the BP neural networks.

[1]  Terrence L. Fine,et al.  Parameter Convergence and Learning Curves for Neural Networks , 1999, Neural Computation.

[2]  Xin Li,et al.  Training Multilayer Perceptrons Via Minimization of Sum of Ridge Functions , 2002, Adv. Comput. Math..

[3]  Wei Wu,et al.  Deterministic convergence of an online gradient method for BP neural networks , 2005, IEEE Transactions on Neural Networks.

[4]  William Finnoff,et al.  Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima , 1992, Neural Computation.

[5]  Wei Wu,et al.  Deterministic convergence of an online gradient method for neural networks , 2002 .

[6]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[7]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[8]  Carl G. Looney,et al.  Pattern recognition using neural networks: theory and algorithms for engineers and scientists , 1997 .

[9]  Wei Wu,et al.  Strong Convergence of Gradient Methods for BP Networks Training , 2005, 2005 International Conference on Neural Networks and Brain.

[10]  David Barber,et al.  Online Learning from Finite Training Sets and Robustness to Input Bias , 1998, Neural Computation.

[11]  Kurt Hornik,et al.  Convergence of learning algorithms with constant learning rates , 1991, IEEE Trans. Neural Networks.

[12]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[13]  Gao Wei-dong Prediction of Stock Market by BP Neural Networks with Technical Indexes as Input , 2003 .

[14]  Wei Wu,et al.  Convergence of online gradient methods for continuous perceptrons with linearly separable training patterns , 2003, Appl. Math. Lett..

[15]  Alexei A. Gaivoronski,et al.  Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[16]  Sang-Hoon Oh Improving the error backpropagation algorithm with a modified error function , 1997, IEEE Trans. Neural Networks.

[17]  John N. Tsitsiklis,et al.  Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[18]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[19]  Wei Wu,et al.  Convergence of gradient method with momentum for two-Layer feedforward neural networks , 2006, IEEE Transactions on Neural Networks.

[20]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[21]  Carl G. Looney,et al.  Pattern recognition using neural networks , 1997 .

[22]  Siak Piang Lim,et al.  PROPER ORTHOGONAL DECOMPOSITION AND ITS APPLICATIONS – PART II: MODEL REDUCTION FOR MEMS DYNAMICAL ANALYSIS , 2002 .

[23]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[24]  Wei Wu,et al.  Convergence of an online gradient method for feedforward neural networks with stochastic inputs , 2004 .

[25]  Harold J. Kushner,et al.  Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[26]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.