The past is important: a method for determining memory structure in NARX neural networks

Recurrent networks have become popular models for system identification and time series prediction. NARX (nonlinear autoregressive models with exogenous inputs) network models are a popular subclass of recurrent networks and have been used in many applications. Though embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show that the use of intelligent memory order selection through pruning and good initial heuristics significantly improves the generalization and predictive performance of these nonlinear systems on problems as diverse as grammatical inference and time series prediction.

[1]  M. El-Hawary,et al.  Load forecasting via suboptimal seasonal autoregressive models and iteratively reweighted least squares estimation , 1993 .

[2]  C. Lee Giles,et al.  An experimental comparison of recurrent neural networks , 1994, NIPS.

[3]  Hava T. Siegelmann,et al.  Computational capabilities of recurrent NARX neural networks , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[4]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[5]  D. Etter,et al.  Adaptive estimation of time delays in sampled data systems , 1981 .

[6]  Ritei Shibata,et al.  6 Various model selection techniques in time series analysis , 1985 .

[7]  R. Mañé,et al.  On the dimension of the compact invariant sets of certain non-linear maps , 1981 .

[8]  Delores M. Etter,et al.  Analysis of an adaptive technique for modeling sparse systems , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[10]  Michael C. Mozer,et al.  Using Relevance to Reduce Network Size Automatically , 1989 .

[11]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[12]  Wirt Atmar,et al.  Notes on the simulation of evolution , 1994, IEEE Trans. Neural Networks.

[13]  A. Cohen,et al.  ECG compression using long-term prediction , 1993, IEEE Transactions on Biomedical Engineering.

[14]  E. Hannan,et al.  The determination of optimum structures for the state space representation of multivariate stochastic processes , 1982 .

[15]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[16]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[17]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[18]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[19]  C. Lee Giles,et al.  Pruning recurrent neural networks for improved generalization performance , 1994, IEEE Trans. Neural Networks.

[20]  F. Takens Detecting strange attractors in turbulence , 1981 .

[21]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[22]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[23]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[24]  H. Akaike A new look at the statistical model identification , 1974 .

[25]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[26]  David B. Fogel,et al.  Evolutionary programming: an introduction and some current directions , 1994 .

[27]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[28]  Ah Chung Tsoi,et al.  A Comparison of Discrete-Time Operator Models and for Nonlinear System Identification , 1994, NIPS.

[29]  Lars Kai Hansen,et al.  Recurrent Networks: Second Order Properties and Pruning , 1994, NIPS.

[30]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.