STABILITY PROPERTIES OF THE GRADIENT PROJECTION METHOD WITH APPLICATIONS TO THE BACKPROPAGATION ALGORITHM

Convergence properties of the generalized gradient projection algorithm in the presence of data perturbations are investigated. It is shown that every trajectory of the method is attracted, in a certain sense, to an "-stationary set of the problem, where " depends on the magnitude of the perturbations. Estimates for the attraction sets of the iterates are given in the general (nonsmooth and nonconvex) case. In the convex case, our results imply convergence to an-optimal set. The results are further strengthened for weakly sharp and strongly convex problems. Convergence of the parallel algorithm in the case of the additive objective function is established. One of the principal applications of our results is the stability analysis of the classical backpropagation algorithm for training artiicial neural networks.

[1]  N. Rouche,et al.  Stability Theory by Liapunov's Direct Method , 1977 .

[2]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[3]  D. Mayne,et al.  Nondifferential optimization via adaptive smoothing , 1984 .

[4]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[7]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[8]  Tarun Khanna,et al.  Foundations of neural networks , 1990 .

[9]  Olvi L. Mangasarian,et al.  Backpropagation Convergence via Deterministic Nonmonotone Perturbed Minimization , 1993, NIPS.

[10]  Olvi L. Mangasarian,et al.  Mathematical Programming in Neural Networks , 1993, INFORMS J. Comput..

[11]  M. Ferris,et al.  Weak sharp minima in mathematical programming , 1993 .

[12]  Alessandro Sperduti,et al.  Speed up learning and network optimization with extended back propagation , 1993, Neural Networks.

[13]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[14]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[15]  Hélène Paugam-Moisy,et al.  Strategies of Weight Updating for Parallel Back-propagation , 1994, Applications in Parallel and Distributed Computing.

[16]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .