Predictive Minimum Description Length Criterion for Time Series Modeling with Neural Networks

Nonlinear time series modeling with a multilayer perceptron network is presented. An important aspect of this modeling is the model selection, i.e., the problem of determining the size as well as the complexity of the model. To overcome this problem we apply the predictive minimum description length (PMDL) principle as a minimization criterion. In the neural network scheme it means minimizing the number of input and hidden units. Three time series modeling experiments are used to examine the usefulness of the PMDL model selection scheme. A comparison with the widely used cross-validation technique is also presented. In our experiments the PMDL scheme and the cross-validation scheme yield similar results in terms of model complexity. However, the PMDL method was found to be two times faster to compute. This is significant improvement since model selection in general is very time consuming.

[1]  Andreas S. Weigend,et al.  Predictions with Confidence Intervals ( Local Error Bars ) , 1994 .

[2]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[3]  M. B. Priestley,et al.  Non-linear and non-stationary time series analysis , 1990 .

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[6]  Xiao Zhi Gao,et al.  Power prediction in mobile communication systems using an optimal neural-network structure , 1997, IEEE Trans. Neural Networks.

[7]  G. Yule On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers , 1927 .

[8]  Jukka Saarinen,et al.  Neural Network Modeling and Prediction of Multivariate Time Series Using Predictive MDL Principle , 1993 .

[9]  Hung Man Tong,et al.  Threshold models in non-linear time series analysis. Lecture notes in statistics, No.21 , 1983 .

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  Jukka Saarinen,et al.  Neural Network Prediction of Non-Linear Time Series Using Predictive MDL Principle , 1993, IEEE Winter Workshop on Nonlinear Digital Signal Processing.

[12]  Jukka Saarinen,et al.  Initializing Weights of a Multilayer Perceptron Network by Using the Orthogonal Least Squares Algorithm , 1995, Neural Computation.

[13]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[14]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[15]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[16]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[17]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[18]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[19]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[20]  Vito Volterra,et al.  Theory of Functionals and of Integral and Integro-Differential Equations , 2005 .

[21]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[22]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[23]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[24]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[25]  Hirotugu Akaike,et al.  On entropy maximization principle , 1977 .

[26]  J. Utans,et al.  Selecting neural network architectures via the prediction risk: application to corporate bond rating prediction , 1991, Proceedings First International Conference on Artificial Intelligence Applications on Wall Street.