Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm

We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of high order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with different non-equivalent risks, such as final prediction error or expected Kullback-Leibler information. We consider the asymptotic behavior of different risk functions and show how they can be generally estimated with the same resampling strategy. Such estimated risks then yield new model selection criteria. In particular, we obtain a data-driven tuning of Rissanen's tree structured context algorithm which is a computationally feasible procedure for selection and estimation of a VLMC.

[1]  P. Doukhan Mixing: Properties and Examples , 1994 .

[2]  Abraham Lempel,et al.  A sequential algorithm for the universal coding of finite memory sources , 1992, IEEE Trans. Inf. Theory.

[3]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[4]  Peter Bühlmann,et al.  Efficient and adaptive post-model-selection estimators , 1999 .

[5]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[6]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[7]  Jorma Rissanen,et al.  Applications of universal context modeling to lossless compression of gray-scale images , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[8]  M. Feder,et al.  Predictive stochastic complexity and model estimation for finite-state processes , 1994 .

[9]  H. Tong Determination of the order of a Markov chain by Akaike's information criterion , 1975, Journal of Applied Probability.

[10]  Jorma Rissanen,et al.  Noise Separation and MDL Modeling of Chaotic Processes , 1994 .

[11]  Neri Merhav,et al.  On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[12]  P. Bühlmann,et al.  Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[13]  Suzanne Bunton A percolating state selector for suffix-tree context models , 1997, Proceedings DCC '97. Data Compression Conference.

[14]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  H. Akaike Statistical predictor identification , 1970 .

[17]  Jorma Rissanen,et al.  Complexity of strings in the class of Markov sources , 1986, IEEE Trans. Inf. Theory.

[18]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[19]  R. Shibata Statistical aspects of model selection , 1989 .

[20]  R. Shibata BOOTSTRAP ESTIMATE OF KULLBACK-LEIBLER INFORMATION FOR MODEL SELECTION , 1997 .

[21]  J. Cavanaugh,et al.  A BOOTSTRAP VARIANT OF AIC FOR STATE-SPACE MODEL SELECTION , 1997 .