An important issue in machine learning theory is the so-called biasvariance trade-off, which shows that a model with a high degree of freedom has often poor generalization capabilities. In learning problems using the quadratic loss function, the well known noise-bias-variance decomposition of the mean squared error sheds light on the nature of the model expected error. This gives insights into regression problem modelling, where the quadratic loss function is particularly appropriate. However, in classification problems, results from the precited decomposition are unapplicable as the appropriate loss function is the zero-one loss function. Attempts to decompose this function into a sum of noise-bias-variance terms have been proposed during the period 1995-2000, until a very nice general framework was proposed in 2000 by Domingos. This report is an account of the general framework proposed by Domingos, in the light of the previous solutions that had been proposed. Two major interests of this theoretical account on bias-variance decomposition are: first, that the notion of bias needs to be redefined in classification problems and, second, that given appropriate definitions of noise, bias, and variance, it is possible to unify different decompositions (among which quadratic and zero-one) in a nice general theoretical framework.
[1]
L. Ryd,et al.
On bias.
,
1994,
Acta orthopaedica Scandinavica.
[2]
Thomas G. Dietterich,et al.
Error-Correcting Output Coding Corrects Bias and Variance
,
1995,
ICML.
[3]
Ron Kohavi,et al.
Bias Plus Variance Decomposition for Zero-One Loss Functions
,
1996,
ICML.
[4]
Pedro M. Domingos.
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss
,
2000,
AAAI/IAAI.
[5]
Pedro M. Domingos.
A Unifeid Bias-Variance Decomposition and its Applications
,
2000,
ICML.
[6]
Johannes Fürnkranz,et al.
Round Robin Classification
,
2002,
J. Mach. Learn. Res..
[7]
Philip S. Yu,et al.
Mining concept-drifting data streams using ensemble classifiers
,
2003,
KDD '03.
[8]
Jerome H. Friedman,et al.
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
,
2004,
Data Mining and Knowledge Discovery.
[9]
Leo Breiman,et al.
Bagging Predictors
,
1996,
Machine Learning.