论文信息 - Learning decision trees for loss minimization in multi-class problems

Learning decision trees for loss minimization in multi-class problems

Many machine learning applications require classi ers that minimize an asymmetric loss function rather than the raw misclassi cation rate. We study methods for modifying C4.5 to incorporate arbitrary loss matrices. One way to incorporate loss information into C4.5 is to manipulate the weights assigned to the examples from di erent classes. For 2-class problems, this works for any loss matrix, but for k > 2 classes, it is not su cient. Nonetheless, we ask what is the set of class weights that best approximates an arbitrary k k loss matrix, and we test and compare several methods: a wrapper method and some simple heuristics. The best method is a wrapper method that directly optimizes the loss using a holdout data set. We de ne complexity measure for loss matrices and show that this measure can predict when more e cient methods will su ce and when the wrapper method must be applied.

Thomas G. Dietterich | Dragos D. Margineantu | D. Margineantu

[1] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[2] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[3] Thomas G. Dietterich,et al. Applying the Waek Learning Framework to Understand and Improve C4.5 , 1996, ICML.

[4] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[5] Ron Kohavi,et al. The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[6] Carla E. Brodley,et al. Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[7] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[8] D. Signorini,et al. Neural networks , 1995, The Lancet.

[9] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[10] Stan Matwin,et al. Learning When Negative Examples Abound , 1997, ECML.

[11] Michael J. Pazzani,et al. Reducing Misclassification Costs , 1994, ICML.

[12] Stan Matwin,et al. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.