Generalization is at the core of evaluation, we estimate the performance of a model on data we have never seen but expect to encounter later on. Our current evaluation procedures assume that the data already seen is a random sample of the domain from which all future data will be drawn. Unfortunately, in practical situations this is rarely the case. Changes in the underlying probabilities will occur and we must evaluate how robust our models to such differences. This paper takes the position that models should be robust in two senses. Firstly, that any small changes in the joint probabilities should not cause large changes in performance. Secondly, that when the dependencies between attributes and the class are constant and only the marginals change, simple adjustments should be sufficient to restore a model’s performance. This paper is intended to generate debate on how measures of robustness might become part of our normal evaluation procedures. Certainly some clear demonstrations of robustness would improve our confidence in our models’ practical merits.
[1]
M. Kenward,et al.
An Introduction to the Bootstrap
,
2007
.
[2]
Robert C. Holte,et al.
Cost curves: An improved method for visualizing classifier performance
,
2006,
Machine Learning.
[3]
Gerhard Widmer,et al.
Learning in the Presence of Concept Drift and Hidden Contexts
,
1996,
Machine Learning.
[4]
Tom Fawcett,et al.
Robust Classification for Imprecise Environments
,
2000,
Machine Learning.
[5]
Foster J. Provost,et al.
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
,
2003,
J. Artif. Intell. Res..
[6]
Peter D. Turney.
The Management of Context-Sensitive Features: A Review of Strategies
,
2002,
ArXiv.
[7]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[8]
Trevor J. Hastie,et al.
Discriminative vs Informative Learning
,
1997,
KDD.
[9]
Judea Pearl,et al.
CAUSATION, ACTION, AND COUNTERFACTUALS
,
2004
.