A delay damage model selection algorithm for NARX neural networks

Recurrent neural networks have become popular models for system identification and time series prediction. Nonlinear autoregressive models with exogenous inputs (NARX) neural network models are a popular subclass of recurrent networks and have been used in many applications. Although embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show that using intelligent memory order selection through pruning and good initial heuristics significantly improves the generalization and predictive performance of these nonlinear systems on problems as diverse as grammatical inference and time series prediction.

[1]  Hava T. Siegelmann,et al.  Computational capabilities of recurrent NARX neural networks , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[2]  Alexander H. Waibel,et al.  The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning , 1990, NIPS.

[3]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[4]  R. Mañé,et al.  On the dimension of the compact invariant sets of certain non-linear maps , 1981 .

[5]  Les E. Atlas,et al.  Recurrent Networks and NARMA Modeling , 1991, NIPS.

[6]  D. B. Fogel,et al.  Using evolutionary programming for modeling: an ocean acoustic example , 1992 .

[7]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[8]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[9]  Garrison W. Cottrell,et al.  Time-delay neural networks: representation and induction of finite-state machines , 1997, IEEE Trans. Neural Networks.

[10]  E. Hannan,et al.  The determination of optimum structures for the state space representation of multivariate stochastic processes , 1982 .

[11]  G. H. Yu,et al.  A METHODOLOGY FOR SELECTING SUBSET AUTOREGRESSIVE TIME SERIES MODELS , 1991 .

[12]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[13]  D. Etter,et al.  Adaptive estimation of time delays in sampled data systems , 1981 .

[14]  B. Nevitt,et al.  Coping With Chaos , 1991, Proceedings of the 1991 International Symposium on Technology and Society - ISTAS `91.

[15]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[16]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[17]  C. Lee Giles,et al.  An experimental comparison of recurrent neural networks , 1994, NIPS.

[18]  Ritei Shibata,et al.  6 Various model selection techniques in time series analysis , 1985 .

[19]  S Z Qin,et al.  Comparison of four neural net learning methods for dynamic system identification , 1992, IEEE Trans. Neural Networks.

[20]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[21]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[22]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[23]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[24]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[25]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[26]  Lars Kai Hansen,et al.  On design and evaluation of tapped-delay neural network architectures , 1993, IEEE International Conference on Neural Networks.

[27]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[28]  H. Akaike A new look at the statistical model identification , 1974 .

[29]  Judith E. Dayhoff,et al.  A Learning Algorithm for Adaptive Time-Delays in a Temporal Neural Network , 1992 .

[30]  M. El-Hawary,et al.  Load forecasting via suboptimal seasonal autoregressive models and iteratively reweighted least squares estimation , 1993 .

[31]  Delores M. Etter,et al.  Analysis of an adaptive technique for modeling sparse systems , 1989, IEEE Trans. Acoust. Speech Signal Process..

[32]  P. Werbos,et al.  Long-term predictions of chemical processes using recurrent neural networks: a parallel training approach , 1992 .

[33]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[34]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[35]  Michael R. Davenport,et al.  Continuous-time temporal back-propagation with adaptable time delays , 1993, IEEE Trans. Neural Networks.

[36]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[37]  I. J. Leontaritis,et al.  Input-output parametric models for non-linear systems Part II: stochastic non-linear systems , 1985 .

[38]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[39]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[40]  Stephen A. Billings,et al.  Non-linear system identification using neural networks , 1990 .

[41]  Wirt Atmar,et al.  Notes on the simulation of evolution , 1994, IEEE Trans. Neural Networks.

[42]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[43]  A. Cohen,et al.  ECG compression using long-term prediction , 1993, IEEE Transactions on Biomedical Engineering.

[44]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[45]  David B. Fogel An information criterion for optimal neural network selection , 1991, IEEE Trans. Neural Networks.

[46]  D. Rumelhart,et al.  Predicting sunspots and exchange rates with connectionist networks , 1991 .

[47]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[48]  Peter Tiňo,et al.  Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[49]  C. Lee Giles,et al.  Pruning recurrent neural networks for improved generalization performance , 1994, IEEE Trans. Neural Networks.

[50]  Tsung-Nan Lin,et al.  Remembering the past: the role of embedded memory in recurrent neural network architectures , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[51]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[52]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[53]  F. Takens Detecting strange attractors in turbulence , 1981 .

[54]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[55]  Hong-Te Su,et al.  Identification of Chemical Processes using Recurrent Networks , 1991, 1991 American Control Conference.

[56]  Judith E. Dayhoff,et al.  Trajectory production with the adaptive time-delay neural network , 1995, Neural Networks.

[57]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[58]  David B. Fogel,et al.  Evolutionary programming: an introduction and some current directions , 1994 .

[59]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[60]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[61]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[62]  Ah Chung Tsoi,et al.  A Comparison of Discrete-Time Operator Models and for Nonlinear System Identification , 1994, NIPS.

[63]  Lars Kai Hansen,et al.  Recurrent Networks: Second Order Properties and Pruning , 1994, NIPS.

[64]  Eric A. Wan,et al.  Combining fossil and sunspot data: committee predictions , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[65]  Michael C. Mozer,et al.  Using Relevance to Reduce Network Size Automatically , 1989 .