TREAT: a trust-region-based error-aggregated training algorithm for neural networks

A trust-region-based error-aggregated training algorithm (TREAT) for multi-layer feedforward neural networks is proposed. In the same spirit as that of the Levenberg-Marquardt (LM) method, the TREAT algorithm uses a different Hessian matrix approximation, which is based on the Jacobian matrix derived from aggregated errors. An aggregation scheme is discussed. It can greatly reduce the size of the matrix to be inverted in each training iteration and thereby lower the iterative computational cost. Compared with the LM method, the TREAT algorithm is computationally less intensive, and requires less memory. This is especially important for large sized neural networks where the LM algorithm becomes impractical.

[1]  D. R. Hush,et al.  Error surfaces for multi-layer perceptrons , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[2]  George W. Irwin,et al.  A hybrid linear/nonlinear training algorithm for feedforward neural networks , 1998, IEEE Trans. Neural Networks.

[3]  C. Charalambous,et al.  Conjugate gradient algorithm for efficient training of artifi-cial neural networks , 1990 .

[4]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[5]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[6]  Nikolai S. Rubanov The layer-wise method and the backpropagation hybrid approach to learning a feedforward neural network , 2000, IEEE Trans. Neural Networks Learn. Syst..

[7]  Heidar A. Malki,et al.  Using the Karhunen-Loe've transformation in the back-propagation training algorithm , 1991, IEEE Trans. Neural Networks.

[8]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[9]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[10]  Sang-Hoon Oh Improving the error backpropagation algorithm with a modified error function , 1997, IEEE Trans. Neural Networks.

[11]  Hui Cheng,et al.  Contrast enhancement for backpropagation , 1996, IEEE Trans. Neural Networks.

[12]  Amir F. Atiya,et al.  An accelerated learning algorithm for multilayer perceptron networks , 1994, IEEE Trans. Neural Networks.

[13]  Anders Krogh,et al.  A Cost Function for Internal Representations , 1989, NIPS.

[14]  Nicolaos B. Karayiannis,et al.  Accelerating the training of feedforward neural networks using generalized Hebbian rules for initializing the internal representations , 1996, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[15]  Tommy W. S. Chow,et al.  Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients , 2001, IEEE Trans. Neural Networks.

[16]  Mohamed Najim,et al.  A fast feedforward training algorithm using a modified form of the standard backpropagation algorithm , 2001, IEEE Trans. Neural Networks.

[17]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[19]  Chih-Cheng Chen,et al.  A fast multilayer neural-network training algorithm based on the layer-by-layer optimizing procedures , 1996, IEEE Trans. Neural Networks.

[20]  D. E. Rumelhart,et al.  Learning internal representations by back-propagating errors , 1986 .

[21]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[22]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[23]  Z. Zainuddin,et al.  Improving the Convergence of the Backpropagation Algorithm Using Local Adaptive Techniques , 2007, International Conference on Computational Intelligence.

[24]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[25]  Roberto Battiti,et al.  BFGS Optimization for Faster and Automated Supervised Learning , 1990 .

[26]  Vasant G Honavar,et al.  AN EMPIRICAL COMPARISON OF FLAT � SPOT ELIMINATION TECHNIQUES IN BACK � PROPAGATION NETWORKS , 1992 .

[27]  Bogdan M. Wilamowski,et al.  Modification of the Backpropagation Algorithm for Faster Convergence , 1993 .

[28]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[29]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[30]  S. Ergezinger,et al.  An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer , 1995, IEEE Trans. Neural Networks.