New results on recurrent network training: unifying the algorithms and accelerating convergence

How to efficiently train recurrent networks remains a challenging and active research topic. Most of the proposed training approaches are based on computational ways to efficiently obtain the gradient of the error function, and can be generally grouped into five major groups. In this study we present a derivation that unifies these approaches. We demonstrate that the approaches are only five different ways of solving a particular matrix equation. The second goal of this paper is develop a new algorithm based on the insights gained from the novel formulation. The new algorithm, which is based on approximating the error gradient, has lower computational complexity in computing the weight update than the competing techniques for most typical problems. In addition, it reaches the error minimum in a much smaller number of iterations. A desirable characteristic of recurrent network training algorithms is to be able to update the weights in an on-line fashion. We have also developed an on-line version of the proposed algorithm, that is based on updating the error gradient approximation in a recursive manner.

[1]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  Frank L. Lewis,et al.  Optimal Control , 1986 .

[5]  Amir F. Atiya Learning on a General Network , 1987, NIPS.

[6]  Fernando J. Pineda,et al.  Dynamics and architecture for neural computation , 1988, J. Complex..

[7]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[8]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[9]  Fernando J. Pineda,et al.  Time Dependent Adaptive Neural Networks , 1989, NIPS.

[10]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[11]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[12]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[13]  Jacob Barhen,et al.  Adjoint-Functions and Temporal Learning Algorithms in Neural Networks , 1990, NIPS.

[14]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[15]  Eric A. Wan,et al.  Temporal backpropagation for FIR neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[16]  Kil To Chong,et al.  Recurrent multilayer perceptron for nonlinear system identification , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[17]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[18]  Guo-Zheng Sun,et al.  Green's Function Method for Fast On-Line Learning Algorithm of Recurrent Neural Networks , 1991, NIPS.

[19]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[20]  Jacob Barhen,et al.  Learning a trajectory using adjoint functions and teacher forcing , 1992, Neural Networks.

[21]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[22]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[23]  A. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[24]  C. Goller,et al.  A Connectionist Control Component for the Theorem Prover SETHEO , 1994 .

[25]  Eric A. Wan,et al.  Relating Real-Time Backpropagation and Backpropagation-Through-Time: An Application of Flow Graph Interreciprocity , 1994, Neural Computation.

[26]  Ah Chung Tsoi,et al.  Locally recurrent globally feedforward networks: a critical review of architectures , 1994, IEEE Trans. Neural Networks.

[27]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[28]  Amir F. Atiya,et al.  Application of the recurrent multilayer perceptron in modeling complex process dynamics , 1994, IEEE Trans. Neural Networks.

[29]  Amir F. Atiya,et al.  How delays affect neural dynamics and learning , 1994, IEEE Trans. Neural Networks.

[30]  P. S. Sastry,et al.  Memory neuron networks for identification and control of dynamical systems , 1994, IEEE Trans. Neural Networks.

[31]  Les E. Atlas,et al.  Recurrent neural networks and robust time series prediction , 1994, IEEE Trans. Neural Networks.

[32]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[33]  A. Atiya,et al.  Identification of nonlinear dynamics using a general spatio-temporal network , 1995 .

[34]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[35]  Pierre Baldi,et al.  Gradient descent learning algorithm overview: a general dynamical systems perspective , 1995, IEEE Trans. Neural Networks.

[36]  J. Suykens,et al.  Nonlinear system identification using neural state space models, applicable to robust control design , 1995 .

[37]  Eric A. Wan,et al.  Diagrammatic Derivation of Gradient Algorithms for Neural Networks , 1996, Neural Computation.

[38]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[39]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[40]  Lizhong Wu,et al.  Optimization of trading systems and portfolios , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[41]  Alessandro Sperduti,et al.  Supervised neural networks for the classification of structures , 1997, IEEE Trans. Neural Networks.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Johan A. K. Suykens,et al.  NLq theory: checking and imposing stability of recurrent neural networks for nonlinear modeling , 1997, IEEE Trans. Signal Process..

[44]  Chuanyi Ji,et al.  A unified approach on fast training of feedforward and recurrent networks using EM algorithm , 1998, IEEE Trans. Signal Process..

[45]  Chuanyi Ji,et al.  Fast training of recurrent networks based on the EM algorithm , 1998, IEEE Trans. Neural Networks.

[46]  J. Sum,et al.  Extended Kalman FilterBased Pruning Method for Recurrent Neural Networks , 1998, Neural Computation.

[47]  Lai-Wan Chan,et al.  Training recurrent network with block-diagonal approximated Levenberg-Marquardt algorithm , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[48]  Amir F. Atiya,et al.  A comparison between neural-network forecasting techniques-case study: river flow forecasting , 1999, IEEE Trans. Neural Networks.

[49]  John F. Kolen,et al.  Gradient Calculations for Dynamic Recurrent Neural Networks , 2001 .