On structure-exploiting trust-region regularized nonlinear least squares algorithms for neural-network learning

This paper briefly introduces our numerical linear algebra approaches for solving structured nonlinear least squares problems arising from 'multiple-output' neural-network (NN) models. Our algorithms feature trust-region regularization, and exploit sparsity of either the 'block-angular' residual Jacobian matrix or the 'block-arrow' Gauss-Newton Hessian (or Fisher information matrix in statistical sense) depending on problem scale so as to render a large class of NN-learning algorithms 'efficient' in both memory and operation costs. Using a relatively large real-world nonlinear regression application, we shall explain algorithmic strengths and weaknesses, analyzing simulation results obtained by both direct and iterative trust-region algorithms with two distinct NN models: 'multilayer perceptrons' (MLP) and 'complementary mixtures of MLP-experts' (or neuro-fuzzy modular networks).

[1]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[2]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[3]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[4]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[5]  John E. Dennis,et al.  An Adaptive Nonlinear Least-Squares Algorithm , 1977, TOMS.

[6]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[7]  Eiji Mizutani,et al.  Fuzzy mixtures of complementary local experts: towards neuro-fuzzy modular networks , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[8]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[9]  E. Mizutani,et al.  On separable nonlinear least squares algorithms for neuro-fuzzy modular network learning , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[10]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[11]  P. Toint,et al.  On Large Scale Nonlinear Least Squares Calculations , 1987 .

[12]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[13]  James Demmel,et al.  On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems , 2000, NIPS.

[14]  James Demmel,et al.  Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian-Vector Multiply , 2003, NIPS.

[15]  Eiji Mizutani,et al.  Powell's dogleg trust-region steps with the quasi-Newton augmented Hessian for neural nonlinear least-squares learning , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[16]  Martin Grötschel,et al.  Mathematical Programming The State of the Art, XIth International Symposium on Mathematical Programming, Bonn, Germany, August 23-27, 1982 , 1983, ISMP.

[17]  Jorge J. Moré,et al.  Recent Developments in Algorithms and Software for Trust Region Methods , 1982, ISMP.

[18]  Stuart E. Dreyfus,et al.  On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[19]  Linda Kaufman,et al.  Separable Nonlinear Least Squares with Multiple Right-Hand Sides , 1992, SIAM J. Matrix Anal. Appl..

[20]  Stuart E. Dreyfus,et al.  On complexity analysis of supervised MLP-learning for algorithmic comparisons , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).