Bias-Variance trade-off characterization in a classification problem What differences with regression ?

An important issue in machine learning theory is the so-called biasvariance trade-off, which shows that a model with a high degree of freedom has often poor generalization capabilities. In learning problems using the quadratic loss function, the well known noise-bias-variance decomposition of the mean squared error sheds light on the nature of the model expected error. This gives insights into regression problem modelling, where the quadratic loss function is particularly appropriate. However, in classification problems, results from the precited decomposition are unapplicable as the appropriate loss function is the zero-one loss function. Attempts to decompose this function into a sum of noise-bias-variance terms have been proposed during the period 1995-2000, until a very nice general framework was proposed in 2000 by Domingos. This report is an account of the general framework proposed by Domingos, in the light of the previous solutions that had been proposed. Two major interests of this theoretical account on bias-variance decomposition are: first, that the notion of bias needs to be redefined in classification problems and, second, that given appropriate definitions of noise, bias, and variance, it is possible to unify different decompositions (among which quadratic and zero-one) in a nice general theoretical framework.