A heuristically enhanced gradient approximation (HEGA) algorithm for training neural networks

In this article we study artificial neural network training under the following two conditions: (a) the training algorithm must not rely on direct computation of gradients and (b) the algorithm must be efficient in training on-line. We review various relevant algorithms that are currently available in the literature and we propose a new algorithm that is further improved with respect to the second condition. We test and compare these algorithms by using commonly used benchmark problems in the literature and compare their efficiency against the popular backpropagation algorithm. Also, we introduce a realistic problem incorporating a robotic elbow manipulator and continue testing the algorithms against this problem.

[1]  Xiaohui Xie,et al.  Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks , 2003, Neural Computation.

[2]  Marwan A. Jabri,et al.  Summed Weight Neuron Perturbation: An O(N) Improvement Over Weight Perturbation , 1992, NIPS.

[3]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[4]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[5]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[6]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[7]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[8]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[9]  K. P. Unnikrishnan,et al.  Alopex: A Correlation-Based Learning Algorithm for Feedforward and Recurrent Neural Networks , 1994, Neural Computation.

[10]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[11]  Marwan A. Jabri,et al.  Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks , 1992, IEEE Trans. Neural Networks.

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Alejandro Bia Alopex-B: A New, Simpler, But Yet Faster Version Of The Alopex Training Algorithm , 2001, Int. J. Neural Syst..

[14]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[15]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[16]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[17]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[18]  Simon Haykin,et al.  Stochastic correlative learning algorithms , 2004, IEEE Transactions on Signal Processing.

[19]  Ron Meir,et al.  A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks , 1992, NIPS.

[20]  Benjamin A. Rowland,et al.  Synaptic noise as a means of implementing weight-perturbation learning , 2006, Connect. Sci..