Asymptotically Optimal Prediction for Time-Varying Data Generating Processes

We develop a methodology (referred to as kinetic prediction) for predicting time series undergoing unknown changes in their data generating distributions. Based on Kolmogorov-Tikhomirov’s <inline-formula> <tex-math notation="LaTeX">${\varepsilon }$ </tex-math></inline-formula>-entropy, we propose a concept called <inline-formula> <tex-math notation="LaTeX">${\varepsilon }$ </tex-math></inline-formula>-predictability that quantifies the size of a model class (which can be parametric or nonparametric) and the maximal number of abrupt structural changes that guarantee the achievability of asymptotically optimal prediction. Moreover, for parametric distribution families, we extend the aforementioned kinetic prediction with discretized function spaces to its counterpart with continuous function spaces, and propose a sequential Monte Carlo-based implementation. We also extend our methodology for predicting smoothly varying data generating distributions. Under reasonable assumptions, we prove that the average predictive performance converges almost surely to the oracle bound, which corresponds to the case that the data generating distributions are known in advance. The results also shed some light on the so called “prediction-inference dilemma.” Various examples and numerical results are provided to demonstrate the wide applicability of our methodology.

[1]  M. Priestley Evolutionary Spectra and Non‐Stationary Processes , 1965 .

[2]  S. Kou,et al.  Stepwise Signal Extraction via Marginal Likelihood , 2016, Journal of the American Statistical Association.

[3]  M. Pollak Optimal Detection of a Change in Distribution , 1985 .

[4]  F. Gustafsson The marginalized likelihood ratio test for detecting abrupt changes , 1996, IEEE Trans. Autom. Control..

[5]  M. Halling,et al.  Predictive Regressions with Time-Varying Coefficients , 2008 .

[6]  A. N. Shiryayev,et al.  Selected Works of A.N. Kolmogorov: Volume III Information Theory and the Theory of Algorithms , 2010 .

[7]  Andrew J. Patton Volatility Forecast Comparison Using Imperfect Volatility Proxies , 2006 .

[8]  George V. Moustakides,et al.  Minimax optimality of Shiryaev-Roberts procedure for quickest drift change detection of a Brownian motion , 2016, ArXiv.

[9]  Nicolas Chopin,et al.  SMC2: an efficient algorithm for sequential analysis of state space models , 2011, 1101.1528.

[10]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[11]  M. A. Girshick,et al.  Bayes and minimax solutions of sequential decision problems , 1949 .

[12]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[13]  Y. Mei Efficient scalable schemes for monitoring a large number of data streams , 2010 .

[14]  C. Granger,et al.  Co-integration and error correction: representation, estimation and testing , 1987 .

[15]  Edit Gombay,et al.  Testing for changes in the covariance structure of linear processes , 2009 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[18]  S. Lauritzen,et al.  Proper local scoring rules , 2011, 1101.5011.

[19]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[20]  Eric Moulines,et al.  Comparison of resampling schemes for particle filtering , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[21]  G. C. Tiao,et al.  Use of Cumulative Sums of Squares for Retrospective Detection of Changes of Variance , 1994 .

[22]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[23]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .

[24]  Yuantao Gu,et al.  Dynamic zero-point attracting projection for time-varying sparse signal recovery , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[26]  G. Moustakides Optimal stopping times for detecting changes in distributions , 1986 .

[27]  R. Dahlhaus On the Kullback-Leibler information divergence of locally stationary processes , 1996 .

[28]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[29]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[30]  R. Sutton,et al.  Atlantic Ocean Forcing of North American and European Summer Climate , 2005, Science.

[31]  Jie Ding,et al.  Bridging AIC and BIC: A New Criterion for Autoregression , 2015, IEEE Transactions on Information Theory.

[32]  Jie Ding,et al.  Model Selection Techniques: An Overview , 2018, IEEE Signal Processing Magazine.

[33]  Timothy J. Vogelsang,et al.  Testing for a Shift in Mean Without Having to Estimate Serial-Correlation Parameters , 1998 .

[34]  Edit Gombay,et al.  ESTIMATORS AND TESTS FOR CHANGE IN VARIANCES , 1996 .

[35]  Rémi Bardenet,et al.  Monte Carlo Methods , 2013, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[36]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[37]  L. J. Wei,et al.  On the Cox Model With Time-Varying Regression Coefficients , 2005 .

[38]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[39]  Jianqing Fan,et al.  Statistical Estimation in Varying-Coefficient Models , 1999 .

[40]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[41]  Juan M. Corchado,et al.  Fight sample degeneracy and impoverishment in particle filters: A review of intelligent approaches , 2013, Expert Syst. Appl..

[42]  S. W. Roberts A Comparison of Some Control Chart Procedures , 1966 .

[43]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[44]  R. Dahlhaus,et al.  Asymptotic statistical inference for nonstationary processes with evolutionary spectra , 1996 .

[45]  Dawei Huang,et al.  Testing for a Change in the Parameter Values and Order of an Autoregressive Model , 1995 .

[46]  V. Tarokh,et al.  Bayesian Model Comparison with the Hyvärinen Score: Computation and Consistency , 2017, Journal of the American Statistical Association.

[47]  Joerg F. Hipp,et al.  Time-Frequency Analysis , 2014, Encyclopedia of Computational Neuroscience.

[48]  Miklós Csörgő,et al.  On the strong law of large numbers and the central limit theorem for martingales , 1968 .

[49]  Jie Ding,et al.  Evolutionary Spectra Based on the Multitaper Method with Application To Stationarity Test , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Giorgio E. Primiceri Time Varying Structural Vector Autoregressions and Monetary Policy , 2002 .

[51]  Clive W. J. Granger,et al.  An Introduction to Time-Varying Parameter Cointegration , 1991 .

[52]  Siem Jan Koopman,et al.  Time Series Analysis by State Space Methods , 2001 .

[53]  Rong Chen,et al.  A Theoretical Framework for Sequential Importance Sampling with Resampling , 2001, Sequential Monte Carlo Methods in Practice.

[54]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[55]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[56]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[57]  Edoardo M. Airoldi,et al.  SLANTS: Sequential Adaptive Nonlinear Modeling of Time Series , 2016, IEEE Transactions on Signal Processing.

[58]  Melvin J. Hinich,et al.  Time Series Analysis by State Space Methods , 2001 .

[59]  N. Chopin A sequential particle filter method for static models , 2002 .

[60]  D. Picard Testing and estimating change-points in time series , 1985, Advances in Applied Probability.

[61]  D. L. Hanson,et al.  On the central limit theorem for the sum of a random number of independent random variables , 1963 .

[62]  Jie Ding,et al.  Multiple Change Point Analysis: Fast Implementation and Strong Consistency , 2016, IEEE Transactions on Signal Processing.

[63]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[64]  Yi-Ching Yao Estimating the number of change-points via Schwarz' criterion , 1988 .

[65]  N. Shephard,et al.  Econometric analysis of realized volatility and its use in estimating stochastic volatility models , 2002 .

[66]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[67]  A. Shiryaev On Optimum Methods in Quickest Detection Problems , 1963 .