A new class of quasi-Newtonian methods for optimal learning in MLP-networks

In this paper, we present a new class of quasi-Newton methods for an effective learning in large multilayer perceptron (MLP)-networks. The algorithms introduced in this work, named LQN, utilize an iterative scheme of a generalized BFGS-type method, involving a suitable family of matrix algebras L. The main advantages of these innovative methods are based upon the fact that they have an O(nlogn) complexity per step and that they require O(n) memory allocations. Numerical experiences, performed on a set of standard benchmarks of MLP-networks, show the competitivity of the LQN methods, especially for large values of n.

[1]  Thomas Kailath,et al.  Displacement structure approach to discrete-trigonometric-transform based preconditioners of G.Strang type and of T.Chan type , 1996, SIAM J. Matrix Anal. Appl..

[2]  Paola Favati,et al.  On a Matrix Algebra Related to the Discrete Hartley Transform , 1993, SIAM J. Matrix Anal. Appl..

[3]  Shun-ichi Amari,et al.  Backpropagation and stochastic gradient descent method , 1993, Neurocomputing.

[4]  Marco Gori,et al.  Non-suspiciousness: a generalisation of convexity in the frame of foundations of numerical analysis and learning , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[6]  R. Chan,et al.  The circulant operator in the banach algebra of matrices , 1991 .

[7]  S. Fanelli,et al.  Improving performances of Battiti-Shanno's quasi-Newtonian algorithms for learning in feed-forward neural networks , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[8]  R. Bracewell The fast Hartley transform , 1984, Proceedings of the IEEE.

[9]  Douglas L. Jones,et al.  On computing the discrete Hartley transform , 1985, IEEE Trans. Acoust. Speech Signal Process..

[10]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[11]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[12]  Ching-Chi Hsu,et al.  Terminal attractor learning algorithms for back propagation neural networks , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[13]  Paolo Zellini,et al.  Matrix algebras in optimal preconditioning , 2001 .

[14]  J. J. Moré,et al.  Quasi-Newton Methods, Motivation and Theory , 1974 .

[15]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[16]  Stefano Fanelli,et al.  Matrix algebras in Quasi-Newton methods for unconstrained minimization , 2003, Numerische Mathematik.

[17]  Ali A. Minai,et al.  Back-propagation heuristics: a study of the extended delta-bar-delta algorithm , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[18]  Gabriele Steidl,et al.  Optimal trigonometric preconditioners for nonsymmetric Toeplitz systems , 1998 .

[19]  T. Chan An Optimal Circulant Preconditioner for Toeplitz Systems , 1988 .

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[22]  Mehiddin Al-Baali Improved Hessian approximations for the limited memory BFGS method , 2004, Numerical Algorithms.

[23]  Zhongde Wang Fast algorithms for the discrete W transform and for the discrete Fourier transform , 1984 .

[24]  Carmine Di Fiore,et al.  Matrix Algebras and Displacement Decompositions , 1999, SIAM J. Matrix Anal. Appl..

[25]  Oscar Buneman Conversion of FFT's to Fast Hartley Transforms , 1986 .

[26]  Eugene E. Tyrtyshnikov,et al.  Optimal and Superoptimal Circulant Preconditioners , 1992, SIAM J. Matrix Anal. Appl..

[27]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[28]  Paolo Zellini,et al.  Matrix algebras in quasi-newtonian algorithms for optimal learning in multi-layer perceptrons , 1999 .

[29]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[30]  Howard Hua Yang,et al.  Natural Gradient Descent for Training Multi-Layer Perceptrons , 1997 .

[31]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[32]  Carmine Di Fiore,et al.  On a set of matrix algebras related to discrete Hartley-type transforms , 2003 .

[33]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[34]  Marco Gori,et al.  Suspiciousness of loading problems , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[35]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[36]  Paolo Zellini,et al.  Optimisation strategies for nonconvex functions and applications to neural networks , 2001 .

[37]  Enrico Bozzo,et al.  On the Use of Certain Matrix Algebras Associated with Discrete Trigonometric Transforms in Matrix Displacement Decomposition , 1995, SIAM J. Matrix Anal. Appl..

[38]  Ronald N. Bracewell The Hartley transform , 1986 .