Fast and efficient second-order training of the dynamic neural network paradigm

In many applications neural networks must process or generate time series, and various network paradigms exist for this purpose. Two prominent examples are time-delay neural networks (TDNN), which are known for their noise suppression capability, and NARX (nonlinear autoregressive models with exogenous inputs) networks, which have a powerful modeling ability (at least Turing equivalence). In this article, we suggest a combination of these two approaches, called dynamic neural network (DYNN), which unifies the particular advantages. Efficient training algorithms are needed to adjust the weights of DYNN. Here, we describe an algorithm for the computation of first-order information about the error surface: temporal backpropagation through time (TBPTT). Essentially, this algorithm is a combination of temporal backpropagation (used for TDNN) and backpropagation through time (used for NARX). The first-order information is then utilized to apply the scaled conjugate gradient (SCG) learning algorithm which approximates second-order with first-order information. The benefits of this approach are shown by means of two benchmark data sets: "logistic map" and "building". It is shown that SCG for DYNN is significantly faster and more accurate than other learning algorithms (e.g. TBPTT, resilient propagation, memoryless Quasi-Newton).

[1]  Ulrich Bodenhausen Learning internal representations of pattern sequences in a neural network with adaptive time-delays , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[2]  Judith E. Dayhoff,et al.  Trajectory production with the adaptive time-delay neural network , 1995, Neural Networks.

[3]  Sun-Yuan Kung,et al.  A delay damage model selection algorithm for NARX neural networks , 1997, IEEE Trans. Signal Process..

[4]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[5]  Garrison W. Cottrell,et al.  Time-delay neural networks: representation and induction of finite-state machines , 1997, IEEE Trans. Neural Networks.

[6]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[7]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[8]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[9]  Ah Chung Tsoi,et al.  Recurrent Neural Network Architectures: An Overview , 1997, Summer School on Neural Networks.

[10]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[11]  Adrian J. Shepherd,et al.  Second-Order Methods for Neural Networks , 1997 .

[12]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[13]  Judith E. Dayhoff,et al.  Adaptive time-delay neural network for temporal correlation and prediction , 1992, Other Conferences.

[14]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[15]  Alex Waibel,et al.  Application oriented automatic structuring of time-delay neural networks for high performance character and speech recognition , 1993, IEEE International Conference on Neural Networks.

[16]  Michael R. Davenport,et al.  Continuous-time temporal back-propagation with adaptable time delays , 1993, IEEE Trans. Neural Networks.

[17]  Kazumi Saito,et al.  Partial BFGS Update and Efficient Step-Length Calculation for Three-Layer Neural Networks , 1997, Neural Computation.

[18]  Hava T. Siegelmann,et al.  Computational capabilities of recurrent NARX neural networks , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[19]  Roberto Gemello,et al.  Efficient training of Time Delay Neural Networks for sequential patterns , 1996, Neurocomputing.

[20]  Hilary Buxton,et al.  Learning Gestures with Time-Delay RBF Networks , 1998 .

[21]  Judith E. Dayhoff,et al.  Spatiotemporal topology and temporal sequence identification with an adaptive time-delay neural network , 1993, Other Conferences.

[22]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..