Cost-sensitive learning by cost-proportionate example weighting

We propose and evaluate a family of methods for converting classifier learning algorithms and classification theory into cost-sensitive algorithms and theory. The proposed conversion is based on cost-proportionate weighting of the training examples, which can be realized either by feeding the weights to the classification algorithm (as often done in boosting), or by careful subsampling. We give some theoretical performance guarantees on the proposed methods, as well as empirical evidence that they are practical alternatives to existing approaches. In particular, we propose costing, a method based on cost-proportionate rejection sampling and ensemble aggregation, which achieves excellent predictive performance on two publicly available datasets, while drastically reducing the computation required by other methods.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[8]  Johanna D. Moore,et al.  Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conference, AAAI 96, IAAI 96, Portland, Oregon, USA, August 4-8, 1996, Volume 1 , 1996, AAAI.

[9]  Charles Elkan,et al.  Boosting and Naive Bayesian learning , 1997 .

[10]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[11]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[12]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[13]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[14]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[15]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[16]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[17]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[18]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[19]  Dragos D. Margineantu,et al.  Class Probability Estimation and Cost-Sensitive Classification Decisions , 2002, ECML.

[20]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.