Linear model selection by cross-validation

We consider the problem of model (or variable) selection in the classical regression model based on cross-validation with an added penalty term for penalizing overfitting. Under some weak conditions, the new criterion is shown to be strongly consistent in the sense that with probability one, for all large n, the criterion chooses the smallest true model. The penalty function denoted by C n depends on the sample size n and is chosen to ensure the consistency in the selection of true model. There are various choices of C n suggested in the literature on model selection. In this paper we show that a particular choice of C n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a fixed choice of C n .