Model selection for weakly dependent time series forecasting

Observing a stationary time series, we propose a two-step procedure for the predictionof the next value of the time series. The first step follows machine learning theory paradigmand consists in determining a set of possible predictors as randomized estimators in (possiblynumerous) different predictive models. The second step follows the model selection paradigmand consists in choosing one predictor with good properties among all the predictors of the firststeps. We study our procedure for two different types of observations: causal Bernoulli shifts andbounded weakly dependent processes. In both cases, we give oracle inequalities: the risk of thechosen predictor is close to the best prediction risk in all predictive models that we consider. Weapply our procedure for predictive models such as linear predictors, neural networks predictorsand non-parametric autoregressive predictors.

[1]  Claire Lacour,et al.  Nonparametric estimation of the stationary density and the transition density of a Markov chain , 2006, math/0611645.

[2]  John Shawe-Taylor,et al.  A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[3]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[4]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[5]  Dharmendra S. Modha,et al.  Memory-Universal Prediction of Stationary Random Processes , 1998, IEEE Trans. Inf. Theory.

[6]  Y. Baraud,et al.  ADAPTIVE ESTIMATION IN AUTOREGRESSION OR β-MIXING REGRESSION VIA MODEL SELECTION By , 2001 .

[7]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[10]  Andrew R. Barron,et al.  Approximation and estimation bounds for artificial neural networks , 2004, Machine Learning.

[11]  C. Ing ACCUMULATED PREDICTION ERRORS, INFORMATION CRITERIA AND OPTIMAL FORECASTING FOR AUTOREGRESSIVE TIME SERIES , 2007, 0708.2373.

[12]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[13]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp oracle inequalities and sparsity , 2008 .

[14]  G. Stoltz Information incomplète et regret interne en prédiction de suites inidividuelles , 2005 .

[15]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[16]  E. Rio Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes , 2000 .

[17]  P. Doukhan Mixing: Properties and Examples , 1994 .

[18]  E. Rio,et al.  Théorie asymptotique de processus aléatoires faiblement dépendants , 2000 .

[19]  P. Doukhan,et al.  Weak Dependence: With Examples and Applications , 2007 .

[20]  Jean-Yves Audibert Aggregated estimators and empirical complexity for least square regression , 2004 .

[21]  J. Dedecker,et al.  New dependence coefficients. Examples and applications to statistics , 2005 .

[22]  Pierre Alquier PAC-Bayesian bounds for randomized empirical risk minimizers , 2007, 0712.1698.

[23]  Donald W. K. Andrews,et al.  Non-strong mixing autoregressive processes , 1984, Journal of Applied Probability.

[24]  Ching-Zong Wei,et al.  Order selection for same-realization predictions in autoregressive processes , 2005, math/0602326.

[25]  P. Doukhan,et al.  WEAKLY DEPENDENT CHAINS WITH INFINITE MEMORY , 2007, 0712.3231.

[26]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[27]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[28]  I. Ibragimov,et al.  Some Limit Theorems for Stationary Processes , 1962 .