Accumulative prediction error and the selection of time series models

Abstract This article reviews the rationale for using accumulative one-step-ahead prediction error (APE) as a data-driven method for model selection. Theoretically, APE is closely related to Bayesian model selection and the method of minimum description length (MDL). The sole requirement for using APE is that the models under consideration are capable of generating a prediction for the next, unseen data point. This means that APE may be readily applied to selection problems involving very complex models. APE automatically takes the functional form of parameters into account, and the ‘plug-in’ version of APE does not require the specification of priors. APE is particularly easy to compute for data that have a natural ordering, such as time series. Here, we explore the possibility of using APE to discriminate the short-range ARMA(1,1) model from the long-range ARFIMA ( 0 , d , 0 ) model. We also illustrate how APE may be used for model meta-selection, allowing one to choose between different model selection methods.

[1]  A. Dawid,et al.  On efficient point prediction systems , 1998 .

[2]  D. Gilden Cognitive emissions of 1/f noise. , 2001, Psychological review.

[3]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[4]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[5]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[6]  Jorma Rissanen,et al.  Strong optimality of the normalized ML models as universal codes and information in data , 2001, IEEE Trans. Inf. Theory.

[7]  Mingzhou Ding,et al.  Statistical Analysis of Timing Errors , 2002, Brain and Cognition.

[8]  Marius Ooms,et al.  Computational aspects of maximum likelihood estimation of autoregressive fractionally integrated moving average models , 2003, Comput. Stat. Data Anal..

[9]  Roger Ratcliff,et al.  Assessing model mimicry using the parametric bootstrap , 2004 .

[10]  P. Gr,et al.  Model Selection Based on Minimum Description Length , 1999 .

[11]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[12]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[13]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[14]  J. Busemeyer,et al.  Model Comparisons and Model Selections Based on Generalization Criterion Methodology. , 2000, Journal of mathematical psychology.

[15]  Mingzhou Ding,et al.  Origins of Timing Errors in Human Sensorimotor Coordination , 2001, Journal of motor behavior.

[16]  R. Ratcliff,et al.  Estimation and interpretation of 1/fα noise in human cognition , 2004 .

[17]  Peter Hänggi,et al.  Long-range memory and non-Markov statistical effects in human sensorimotor coordination , 2002 .

[18]  C. Z. Wei On Predictive Least Squares Principles , 1992 .

[19]  P. Grünwald,et al.  An empirical study of minimum description length model selection with infinite parametric complexity , 2006 .

[20]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[21]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[22]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[23]  M. Stone Asymptotics for and against cross-validation , 1977 .

[24]  Yanqing Chen,et al.  Long Memory Processes ( 1 / f α Type) in Human Coordination , 1997 .

[25]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[26]  A Gottschalk,et al.  Evidence of chaotic mood variation in bipolar disorder. , 1995, Archives of general psychiatry.

[27]  I. J. Myung,et al.  The Importance of Complexity in Model Selection. , 2000, Journal of mathematical psychology.

[28]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[29]  Alexander Gammerman,et al.  Special Issue Editorial: Kolmogorov Complexity , 1999 .

[30]  Dharmendra S. Modha,et al.  Prequential and Cross-Validated Regression Estimation , 1998, Machine Learning.

[31]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[32]  R. Leipus,et al.  Testing for long memory in the presence of a general trend , 2001, Journal of Applied Probability.

[33]  George Gabor,et al.  Generalised linear model selection by the predictive least quasi-deviance criterion , 1996 .

[34]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[35]  Dharmendra S. Modha,et al.  Memory-Universal Prediction of Stationary Random Processes , 1998, IEEE Trans. Inf. Theory.

[36]  G. V. van Orden,et al.  Self-organization of cognitive performance. , 2003, Journal of experimental psychology. General.

[37]  D L Gilden,et al.  1/f noise in human cognition. , 1995, Science.

[38]  Jie W Weiss,et al.  Bayesian Statistical Inference for Psychological Research , 2008 .

[39]  M. Degroot,et al.  Bayesian Statistics 2. , 1987 .

[40]  Peter Händel,et al.  Noise in physical systems and 1/f fluctuations , 1993 .

[41]  Mark H. A. Davis,et al.  Strong Consistency of the PLS Criterion for Order Determination of Autoregressive Processes , 1989 .

[42]  Fallaw Sowell Maximum likelihood estimation of stationary univariate fractionally integrated time series models , 1992 .

[43]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[44]  Gopal K. Basak,et al.  The Approximation of Long-Memory Processes by an ARMA Model , 2001 .

[45]  R. Voss,et al.  ‘1/fnoise’ in music and speech , 1975, Nature.

[46]  David L. Gilden,et al.  Fluctuations in the Time Required for Elementary Decisions , 1997 .

[47]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[48]  U. Hjorth Model Selection and Forward Validation , 1982 .

[49]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[50]  Didier Delignières,et al.  The fractal dynamics of self-esteem and physical self. , 2004, Nonlinear dynamics, psychology, and life sciences.

[51]  Brendan McCabe,et al.  TESTING FOR LONG MEMORY , 2006, Econometric Theory.

[52]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[53]  Turalay Kenc,et al.  Ox: An Object-Oriented Matrix Language , 1997 .

[54]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[55]  Jorma Rissanen,et al.  Density estimation by stochastic complexity , 1992, IEEE Trans. Inf. Theory.

[56]  E. Novikov,et al.  Scale-similar activity in the brain , 1997 .

[57]  L. Gerencsér On Rissanen's predictive stochastic complexity for stationary ARMA processes , 1994 .

[58]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[59]  Jorma Rissanen,et al.  A Predictive Least-Squares Principle , 1986 .

[60]  M. Forster,et al.  Key Concepts in Model Selection: Performance and Generalizability. , 2000, Journal of mathematical psychology.

[61]  Clive W. J. Granger,et al.  Time Series Modelling and Interpretation , 1976 .

[62]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[63]  I.,et al.  Weight of Evidence : A Brief Survey , 2006 .

[64]  Clive W. J. Granger,et al.  An introduction to long-memory time series models and fractional differencing , 2001 .

[65]  Dietrich Wolf,et al.  Noise in physical systems , 1978 .

[66]  D. Sornette Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforganization and Disorder: Concepts and Tools , 2000 .

[67]  Amos Storkey,et al.  In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics , 2001 .

[68]  I. J. Myung,et al.  Applying Occam’s razor in modeling cognition: A Bayesian approach , 1997 .

[69]  Adrian E. Raftery,et al.  Hypothesis testing and model selection , 1996 .

[70]  T. Gisiger Scale invariance in biology: coincidence or footprint of a universal mechanism? , 2001, Biological reviews of the Cambridge Philosophical Society.

[71]  Mingzhou Ding,et al.  Processes with long-range correlations : theory and applications , 2003 .

[72]  Bonnie K. Ray,et al.  A note on moving average forecasts of long memory processes with an application to quality control , 2002 .

[73]  John Aitchison,et al.  Statistical Prediction Analysis , 1975 .

[74]  Jorma Rissanen Complexity of simple nonlogarithmic loss functions , 2003, IEEE Trans. Inf. Theory.

[75]  H. Akaike A new look at the statistical model identification , 1974 .

[76]  R. Ratcliff,et al.  Human cognition and a pile of sand: a discussion on serial correlations and self-organized criticality. , 2005, Journal of experimental psychology. General.

[77]  H. E. Hurst,et al.  Long-Term Storage Capacity of Reservoirs , 1951 .

[78]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[79]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[80]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[81]  Bertrand Clarke,et al.  COMBINING MODEL SELECTION PROCEDURES FOR ONLINE PREDICTION , 1999 .

[82]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[83]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[84]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[85]  J. Hosking Modeling persistence in hydrological time series using fractional differencing , 1984 .

[86]  C. Granger,et al.  AN INTRODUCTION TO LONG‐MEMORY TIME SERIES MODELS AND FRACTIONAL DIFFERENCING , 1980 .

[87]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[88]  I. J. Myung,et al.  Toward a method of selecting among computational models of cognition. , 2002, Psychological review.

[89]  A. Dawid Fisherian Inference in Likelihood and Prequential Frames of Reference , 1991 .

[90]  I. Good,et al.  Fractals: Form, Chance and Dimension , 1978 .

[91]  M. Pagano Estimation of Models of Autoregressive Signal Plus White Noise , 1974 .

[92]  Fallaw Sowell Modeling long-run behavior with the fractional ARIMA model , 1992 .

[93]  Xavier de Luna,et al.  Choosing a Model Selection Strategy , 2003 .

[94]  Bradley S. Peterson,et al.  The temporal dynamics of tics in Gilles de la Tourette syndrome , 1998, Biological Psychiatry.

[95]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[96]  Jeff Pressing,et al.  Spectral properties of human cognition and skill , 1997, Biological Cybernetics.

[97]  R. J. Bhansali,et al.  Autoregressive model selection for multistep prediction , 1999 .

[98]  Bonnie K. Ray,et al.  Model selection and forecasting for long‐range dependent processes , 1996 .

[99]  Walter R. Gilks,et al.  Hypothesis testing and model selection , 1995 .

[100]  Richard T. Baillie,et al.  Long memory processes and fractional integration in econometrics , 1996 .

[101]  Thomas L. Thornton,et al.  Provenance of correlations in psychological data , 2005, Psychonomic bulletin & review.

[102]  Grünwald,et al.  Model Selection Based on Minimum Description Length. , 2000, Journal of mathematical psychology.

[103]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[104]  David M. Raup,et al.  How Nature Works: The Science of Self-Organized Criticality , 1997 .

[105]  Browne,et al.  Cross-Validation Methods. , 2000, Journal of mathematical psychology.

[106]  Henry Tirri,et al.  Comparing Prequential Model Selection Criteria in Supervised Learning of Mixture Models , 2001, AISTATS.

[107]  武者 利光,et al.  Noise in physical systems and 1/f fluctuations , 1992 .

[108]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[109]  Tang,et al.  Self-Organized Criticality: An Explanation of 1/f Noise , 2011 .

[110]  M. Peruggia Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.) , 2003 .

[111]  Murad S. Taqqu,et al.  Theory and applications of long-range dependence , 2003 .

[112]  Brendan McCabe,et al.  Testing for Long Memory , 2006 .

[113]  L. M. M.-T. Theory of Probability , 1929, Nature.

[114]  N. T. Kottegoda,et al.  Stochastic Modelling of Riverflow Time Series , 1977 .

[115]  Sasuke Miyazima,et al.  Fluctuation of biological rhythm in finger tapping , 2000 .