Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1

We study here convergence properties of serial and parallel backpropagation algorithm for training of neural nets, as well as its modification with momentum term. It is shown that these algorithms can be put into the general framework of the stochastic gradient methods. This permits to consider from the same positions both stochastic and deterministic rules for the selection of components (training examples) of the error function to minimize at each iteration. We obtained weaker conditions on the stepsize for deterministic case and provide quite general synchronization rule for parallel version.

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  András Prékopa,et al.  Contributions to the theory of stochastic programming , 1973, Math. Program..

[3]  Peter Kall,et al.  Stochastic Linear Programming , 1975 .

[4]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[5]  Roger J.-B. Wets,et al.  Stochastic Programming: Solution Techniques and Approximation Schemes , 1982, ISMP.

[6]  A. A. Gaivoronskii Approximation methods of solution of stochastic programming problems , 1982 .

[7]  Yuri Ermoliev,et al.  Numerical techniques for stochastic optimization , 1988 .

[8]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[9]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[10]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[11]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[12]  Yuri M. Ermoliev,et al.  Stochastic quasigradient methods for optimization of discrete event systems , 1992, Ann. Oper. Res..

[13]  Giancarlo Mauri,et al.  Combining Image Processing Operators and Neural Networks in A Face Recognition System , 1992, Int. J. Pattern Recognit. Artif. Intell..

[14]  Olvi L. Mangasarian,et al.  Mathematical Programming in Neural Networks , 1993, INFORMS J. Comput..

[15]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[16]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[17]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.