An iterative method for multi-class cost-sensitive learning

Cost-sensitive learning addresses the issue of classification in the presence of varying costs associated with different types of misclassification. In this paper, we present a method for solving multi-class cost-sensitive learning problems using any binary classification algorithm. This algorithm is derived using hree key ideas: 1) iterative weighting; 2) expanding data space; and 3) gradient boosting with stochastic ensembles. We establish some theoretical guarantees concerning the performance of this method. In particular, we show that a certain variant possesses the boosting property, given a form of weak learning assumption on the component binary classifier. We also empirically evaluate the performance of the proposed method using benchmark data sets and verify that our method generally achieves better results than representative methods for cost-sensitive learning, in terms of predictive performance (cost minimization) and, in many cases, computational efficiency.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[3]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[4]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[5]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[8]  Fabio Roli,et al.  Cost-sensitive Learning in Support Vector Machines , 2002 .

[9]  Thomas G. Dietterich,et al.  Methods for cost-sensitive learning , 2002 .

[10]  Fritz Wysotzki,et al.  Perceptron Based Learning with Example Dependent and Noisy Costs , 2003, ICML.

[11]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[12]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Gholamreza Nakhaeizadeh,et al.  Cost-Sensitive Pruning of Decision Trees , 1994, ECML.

[15]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[18]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[19]  Charles Elkan,et al.  Magical thinking in data mining: lessons from CoIL challenge 2000 , 2001, KDD '01.

[20]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.