Training Multilayer Perceptrons Via Minimization of Sum of Ridge Functions

Motivated by the problem of training multilayer perceptrons in neural networks, we consider the problem of minimizing E(x)=∑i=1nfi(ξi⋅x), where ξi∈Rs, 1≤i≤n, and each fi(ξi⋅x) is a ridge function. We show that when n is small the problem of minimizing E can be treated as one of minimizing univariate functions, and we use the gradient algorithms for minimizing E when n is moderately large. For large n, we present the online gradient algorithms and especially show the monotonicity and weak convergence of the algorithms.

[1]  Terrence L. Fine,et al.  Parameter Convergence and Learning Curves for Neural Networks , 1999, Neural Computation.

[2]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[3]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[4]  Luigi Grippo,et al.  Convergent on-line algorithms for supervised learning in neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[5]  H. Kushner,et al.  Analysis of adaptive step-size SA algorithms for parameter tracking , 1995, IEEE Trans. Autom. Control..

[6]  William Finnoff,et al.  Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima , 1992, Neural Computation.

[7]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  Sang-Hoon Oh Improving the error backpropagation algorithm with a modified error function , 1997, IEEE Trans. Neural Networks.

[10]  M. A. Wolfe A first course in numerical analysis , 1972 .

[11]  Kurt Hornik,et al.  Convergence of learning algorithms with constant learning rates , 1991, IEEE Trans. Neural Networks.

[12]  R. D. Murphy,et al.  Iterative solution of nonlinear equations , 1994 .

[13]  Terrence L. Fine,et al.  Online Steepest Descent Yields Weights with Nonnormal Limiting Distribution , 1996, Neural Computation.

[14]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[15]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[16]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[17]  David Barber,et al.  Online Learning from Finite Training Sets and Robustness to Input Bias , 1998, Neural Computation.

[18]  Alexei A. Gaivoronski,et al.  Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[19]  Marco Gori,et al.  Optimal convergence of on-line backpropagation , 1996, IEEE Trans. Neural Networks.

[20]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[21]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .