Characterizing the generalization performance of model selection strategies

We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential aspects of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. With this view, we develop a new understanding of complexity-penalization methods: First, the penalty terms can be interpreted as postulating a particular profile for the variances as a function of model complexity—if the postulated and true profiles do not match, then systematic under-fitting or over-fitting results, depending on whether the penalty terms are too large or too small. Second, we observe that it is generally best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this characterization to introduce the notion of easy versus hard model selection problems. Here we show that if the variance profile grows too rapidly in relation to the biases, then standard model selection techniques become prone to significant errors. This can happen, for example, in regression problems where the independent variables are drawn from wide-tailed distributions. To counter this, we discuss a new model selection strategy that dramatically outperforms standard complexity-penalization and hold-out methods on these hard tasks.