论文信息 - Training Multilayer Perceptrons Via Minimization of Sum of Ridge Functions

Training Multilayer Perceptrons Via Minimization of Sum of Ridge Functions

Motivated by the problem of training multilayer perceptrons in neural networks, we consider the problem of minimizing E(x)=∑i=1nfi(ξi⋅x), where ξi∈Rs, 1≤i≤n, and each fi(ξi⋅x) is a ridge function. We show that when n is small the problem of minimizing E can be treated as one of minimizing univariate functions, and we use the gradient algorithms for minimizing E when n is moderately large. For large n, we present the online gradient algorithms and especially show the monotonicity and weak convergence of the algorithms.

[1] Terrence L. Fine,et al. Parameter Convergence and Learning Curves for Neural Networks , 1999, Neural Computation.

[2] M.H. Hassoun,et al. Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[3] O. Mangasarian,et al. Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[4] Luigi Grippo,et al. Convergent on-line algorithms for supervised learning in neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[5] H. Kushner,et al. Analysis of adaptive step-size SA algorithms for parameter tracking , 1995, IEEE Trans. Autom. Control..

[6] William Finnoff,et al. Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima , 1992, Neural Computation.

[7] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.

[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9] Sang-Hoon Oh. Improving the error backpropagation algorithm with a modified error function , 1997, IEEE Trans. Neural Networks.

[10] M. A. Wolfe. A first course in numerical analysis , 1972 .

[11] Kurt Hornik,et al. Convergence of learning algorithms with constant learning rates , 1991, IEEE Trans. Neural Networks.

[12] R. D. Murphy,et al. Iterative solution of nonlinear equations , 1994 .

[13] Terrence L. Fine,et al. Online Steepest Descent Yields Weights with Nonnormal Limiting Distribution , 1996, Neural Computation.

[14] D. Signorini,et al. Neural networks , 1995, The Lancet.

[15] Zhi-Quan Luo,et al. On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[16] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[17] David Barber,et al. Online Learning from Finite Training Sets and Robustness to Input Bias , 1998, Neural Computation.

[18] Alexei A. Gaivoronski,et al. Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[19] Marco Gori,et al. Optimal convergence of on-line backpropagation , 1996, IEEE Trans. Neural Networks.

[20] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[21] Luo Zhi-quan,et al. Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .