论文信息 - On structure-exploiting trust-region regularized nonlinear least squares algorithms for neural-network learning

On structure-exploiting trust-region regularized nonlinear least squares algorithms for neural-network learning

This paper briefly introduces our numerical linear algebra approaches for solving structured nonlinear least squares problems arising from 'multiple-output' neural-network (NN) models. Our algorithms feature trust-region regularization, and exploit sparsity of either the 'block-angular' residual Jacobian matrix or the 'block-arrow' Gauss-Newton Hessian (or Fisher information matrix in statistical sense) depending on problem scale so as to render a large class of NN-learning algorithms 'efficient' in both memory and operation costs. Using a relatively large real-world nonlinear regression application, we shall explain algorithmic strengths and weaknesses, analyzing simulation results obtained by both direct and iterative trust-region algorithms with two distinct NN models: 'multilayer perceptrons' (MLP) and 'complementary mixtures of MLP-experts' (or neuro-fuzzy modular networks).

James Demmel | Eiji Mizutani

[1] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.

[2] Nicholas I. M. Gould,et al. Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[3] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[4] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .

[5] John E. Dennis,et al. An Adaptive Nonlinear Least-Squares Algorithm , 1977, TOMS.

[6] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[7] Eiji Mizutani,et al. Fuzzy mixtures of complementary local experts: towards neuro-fuzzy modular networks , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[8] Timothy Masters,et al. Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[9] E. Mizutani,et al. On separable nonlinear least squares algorithms for neuro-fuzzy modular network learning , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[10] J. Navarro-Pedreño. Numerical Methods for Least Squares Problems , 1996 .

[11] P. Toint,et al. On Large Scale Nonlinear Least Squares Calculations , 1987 .

[12] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[13] James Demmel,et al. On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems , 2000, NIPS.

[14] James Demmel,et al. Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian-Vector Multiply , 2003, NIPS.

[15] Eiji Mizutani,et al. Powell's dogleg trust-region steps with the quasi-Newton augmented Hessian for neural nonlinear least-squares learning , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[16] Martin Grötschel,et al. Mathematical Programming The State of the Art, XIth International Symposium on Mathematical Programming, Bonn, Germany, August 23-27, 1982 , 1983, ISMP.

[17] Jorge J. Moré,et al. Recent Developments in Algorithms and Software for Trust Region Methods , 1982, ISMP.

[18] Stuart E. Dreyfus,et al. On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[19] Linda Kaufman,et al. Separable Nonlinear Least Squares with Multiple Right-Hand Sides , 1992, SIAM J. Matrix Anal. Appl..

[20] Stuart E. Dreyfus,et al. On complexity analysis of supervised MLP-learning for algorithmic comparisons , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).