Backpropagation Through Time: What It Does and How to Do It

Backpropagation is now the most widely used tool in the field of artificial neural networks. At the core of backpropagation is a method for calculating derivatives exactly and efficiently in any large system made up of elementary subsystems or calculations which are represented by known, differentiable functions; thus, backpropagation has many applications which do not involve neural networks as such. This paper first reviews basic backpropagation, a simple method which is now being widely used in areas like pattern recognition and fault diagnosis. Next, it presents the basic equations for backpropagation through time, and discusses applications to areas like pattern recognition involving dynamic systems, systems identification, and control. Finally, i t describes further extensions of this method, to deal with systems other than neural networks, systems involving simultaneous equations or true recurrent networks, and other practical issues which arise with this method. Pseudocode is provided to clarify the algorithms. The chain rule for ordered derivatives-the theorem which underlies backpropagation-is briefly discussed.

[1]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2]  David F. Shanno,et al.  Conjugate Gradient Methods with Inexact Searches , 1978, Math. Oper. Res..

[3]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[4]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Fernando J. Pineda,et al.  GENERALIZATION OF BACKPROPAGATION TO RECURRENT AND HIGH-ORDER NETWORKS. , 1987 .

[7]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[8]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[9]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[10]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[11]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[12]  Y. Le Cun,et al.  Comparing different neural network architectures for classifying handwritten digits , 1989, International 1989 Joint Conference on Neural Networks.

[13]  K. Shikano,et al.  Parallelism, hierarchy, scaling in time-delay neural networks for spotting Japanese phonemes CV-syllables , 1989, International 1989 Joint Conference on Neural Networks.

[14]  Michael I. Jordan,et al.  Generic constraints on underspecified target trajectories , 1989, International 1989 Joint Conference on Neural Networks.

[15]  Paul J. Werbos,et al.  Maximizing long-term gas industry profits in two minutes in Lotus using neural network methods , 1989, IEEE Trans. Syst. Man Cybern..

[16]  Ronald J. Williams,et al.  Adaptive state representation and estimation using recurrent connectionist networks , 1990 .

[17]  Kumpati S. Narendra,et al.  Adaptive control using neural networks , 1990 .

[18]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[19]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[20]  David F. Shanno,et al.  Recent advances in numerical techniques for large scale optimization , 1990 .

[21]  B. Womack,et al.  Adaptive Control Using Neural Networks , 1991, 1991 American Control Conference.