ON THE CONSEQUENCES OF THE STATISTICAL MECHANICS THEORY OF LEARNING CURVES FOR THE MODEL SELECTION PROBLEM

The Statistical Mechanics (SM) approach to the analysis of learning curves has enjoyed increased attention and success in the eld of computational learning theory over the past several years. In part due to the novelty of its technical methods, and in part due to its identi cation of interesting learning curve behavior not explained by classical power law theories, the SM theory has emerged as an important complement to the powerful and general Vapnik-Chervonenkis (VC) theory of learning curves. To crudely summarize the di erences between the SM and VC theories, we can say that the VC theory requires less knowledge of the problem speci cs than the SM theory , but the VC theory may su er for this more general approach by predicting learning curves that deviate further from the actual behavior than those predicted by the SM theory. It is worth emphasizing that from a mathematically rigorous point of view, both theories may o er only an upper bound on the learning curve, and these upper bounds will diverge from the true behavior as the implicit assumptions of the theories are violated by the problem under consideration. However, it seems fair to assert that the SM theory has a better chance at capturing the true behavior of the learning curve, since more speci cs of the problem are taken into account. While the SM theory has contributed new methods of analysis and new bounds on learning curves, it has done so for essentially the same learning algorithms that are considered in the VC theory and its variants. For instance, it is common in the VC framework to analyze the learning algorithm that chooses the hypothesis minimizing the training error (breaking ties arbitrarily), and it is now known that the SM theory can also provide learning curve upper bounds for this algorithm . Similarly, many investigations in the SM theory focus on the Gibbs algorithm (which essentially chooses a hypothesis randomly from a distribution that exponentially penalizes training error), but the tools of the VC theory are equally applicable here as well . Thus the SM theory is primarily descriptive rather than prescriptive: we may obtain new and better bounds on learning curves, but for the same algorithms we have been studying all along. Are there natural learning problems in which the sometimes rather di erent predictions made by the VC and SM theories would have algorithmic consequences? Here we argue that model selection, in which we must choose the appropriate value for the complexity of our hypothesis, is such a problem. An informal example will serve to illustrate the issues. Suppose we are given a set of training data S = fhxi; biigi=1