A Comparison of First and Second Order Training Algorithms for Artificial Neural Networks

Minimization methods for training feed-forward networks with Backpropagation are compared. Feedforward network training is a special case of functional minimization, where no explicit model of the data is assumed. Therefore due to the high dimensionality of the data, linearization of the training problem through use of orthogonal basis functions is not desirable. The focus is functional minimization on any basis. A number of methods based on local gradient and Hessian matrices are discussed. Modifications of many methods of first and second order training methods are considered. Using share rates data, experimentally it is proved that Conjugate gradient and Quasi Newton's methods outperformed the Gradient Descent methods. In case of the Levenberg-Marquardt algorithm is of special interest in financial forecasting.

[1]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[2]  Stefano Fanelli,et al.  A new class of quasi-Newtonian methods for optimal learning in MLP-networks , 2003, IEEE Trans. Neural Networks.

[3]  Philip E. Gill,et al.  Practical optimization , 1981 .

[4]  Jennie Si,et al.  Advanced neural-network training algorithm with reduced complexity based on Jacobian deficiency , 1998, IEEE Trans. Neural Networks.

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[7]  Stephen T. Welstead,et al.  Neural network and fuzzy logic applications in C/C++ , 1994, Wiley professional computing.

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[10]  Anastasios N. Venetsanopoulos,et al.  Artificial neural networks - learning algorithms, performance evaluation, and applications , 1992, The Kluwer international series in engineering and computer science.

[11]  S. Ergezinger,et al.  An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer , 1995, IEEE Trans. Neural Networks.

[12]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[13]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[14]  John R. Deller,et al.  Selective training of feedforward artificial neural networks using matrix perturbation theory , 1995, Neural Networks.

[15]  Venansius Baryamureeba,et al.  PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 8 , 2005 .

[16]  Paul Kang-Hoh Phua,et al.  Parallel nonlinear optimization techniques for training neural networks , 2003, IEEE Trans. Neural Networks.

[17]  Nicol N. Schraudolph,et al.  Combining Conjugate Direction Methods with Stochastic Approximation of Gradients , 2003, AISTATS.

[18]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[19]  Adrian J. Shepherd Second-Order Optimisation Methods , 1997 .

[20]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[21]  Z. Strakos,et al.  On error estimation in the conjugate gradient method and why it works in finite precision computations. , 2002 .

[22]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[23]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[24]  Chung-Ming Kuan,et al.  Forecasting exchange rates using feedforward and recurrent neural networks , 1992 .

[25]  Amir F. Atiya,et al.  A comparison between neural-network forecasting techniques-case study: river flow forecasting , 1999, IEEE Trans. Neural Networks.

[26]  N. Schraudolph Fast Second-Order Gradient Descent via O(n) Curvature Matrix-Vector Products , 2000 .

[27]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[28]  Tahseen Ahmed Jilani,et al.  Levenberg-Marquardt Algorithm for Karachi Stock Exchange Share Rates Forecasting , 2007 .

[29]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[30]  M. Husken,et al.  Optimization for problem classes-neural networks that learn to learn , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[31]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[32]  Jacques de Villiers,et al.  Backpropagation neural nets with one and two hidden layers , 1993, IEEE Trans. Neural Networks.

[33]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[34]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[35]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.