Gradient descent learning algorithm overview: a general dynamical systems perspective

Gives a unified treatment of gradient descent learning algorithms for neural networks using a general framework of dynamical systems. This general approach organizes and simplifies all the known algorithms and results which have been originally derived for different problems (fixed point/trajectory learning), for different models (discrete/continuous), for different architectures (forward/recurrent), and using different techniques (backpropagation, variational calculus, adjoint methods, etc.). The general approach can also be applied to derive new algorithms. The author then briefly examines some of the complexity issues and limitations intrinsic to gradient descent learning. Throughout the paper, the author focuses on the problem of trajectory learning.

[1]  Jacob Barhen,et al.  Adjoint-Functions and Temporal Learning Algorithms in Neural Networks , 1990, NIPS.

[2]  John J. Hopfield,et al.  Connected-digit speaker-dependent speech recognition using a neural network with time-delayed connections , 1991, IEEE Trans. Signal Process..

[3]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[4]  Gerald Tesauro,et al.  Neurogammon Wins Computer Olympiad , 1989, Neural Computation.

[5]  Terrence J. Sejnowski,et al.  Faster Learning for Dynamic Recurrent Backpropagation , 1990, Neural Computation.

[6]  S. Amari,et al.  Characteristics of Random Nets of Analog Neuron-Like Elements , 1972, IEEE Trans. Syst. Man Cybern..

[7]  M. Konishi,et al.  Axonal delay lines for time measurement in the owl's brainstem. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[8]  V. Pan How can we speed up matrix multiplication , 1984 .

[9]  A. E. Bryson,et al.  A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[10]  J. Hindmarsh,et al.  A model of neuronal bursting using three coupled first order differential equations , 1984, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[11]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[12]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Pierre Baldi,et al.  Contrastive Learning and Neural Oscillations , 1991, Neural Computation.

[14]  José Carlos Príncipe,et al.  A Theory for Neural Networks with Time Delays , 1990, NIPS.

[15]  János Komlós,et al.  Convergence results in an associative memory model , 1988, Neural Networks.

[16]  J. Hale Theory of Functional Differential Equations , 1977 .

[17]  Thomas Kailath,et al.  A general weight matrix formulation using optimal control , 1991, IEEE Trans. Neural Networks.

[18]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[20]  Pierre Baldi,et al.  Computing with Arrays of Bell-Shaped and Sigmoid Functions , 1990, NIPS.

[21]  Patrice Y. Simard,et al.  Shaping the State Space Landscape in Recurrent Networks , 1990, NIPS.

[22]  D. O. Hebb,et al.  The organization of behavior , 1988 .

[23]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[24]  R. FitzHugh Impulses and Physiological States in Theoretical Models of Nerve Membrane. , 1961, Biophysical journal.

[25]  Amir F. Atiya,et al.  How delays affect neural dynamics and learning , 1994, IEEE Trans. Neural Networks.

[26]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[27]  J. Troutman Variational Calculus with Elementary Convexity , 1983 .

[28]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[29]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[30]  Pierre Baldi,et al.  Computing with Arrays of Coupled Oscillators: An Application to Preattentive Texture Discrimination , 1990, Neural Computation.