论文信息 - Cost-Sensitive Classifier Selection Using the ROC Convex Hull Method

Cost-Sensitive Classifier Selection Using the ROC Convex Hull Method

One binary classifier may be preferred to another based on the fact that it has better prediction accuracy than its competitor. Without additional information describing the cost of a misclassifi-cation, accuracy alone as a selection criterion may not be a sufficiently robust measure when the distribution of classes is greatly skewed or the costs of different types of errors may be significantly different. The receiver operating characteristic (ROC) curve is often used to summarize binary classifier performance due to its ease of interpretation, but does not include misclassification cost information in its formulation. Provost and Fawcett [5, 7] have developed the ROC Convex Hull (ROCCH) method that incorporates techniques from ROC curve analysis, decision analysis, and computational geometry in the search for the optimal classifier that is robust with respect to skewed or imprecise class distributions and disparate misclassification costs. We apply the ROCCH method to several datasets using a variety of modeling tools to build binary classifiers and compare their performances using misclassification costs. We support Pro-vost, Fawcett, and Kohavi's claim [6] that classifier accuracy, as represented by the area under the ROC curve, is not an optimal criterion in itself for choosing a classifier, and that by using the ROCCH method, a more appropriate classifier may be found that realistically reflects class distribution and misclassification costs.

Ross Bettinger | R. Bettinger

[1] S J Wyard. Medical Images: Formation, Perception and Measurement. , 1977 .

[2] David J. Spiegelhalter,et al. Machine Learning, Neural and Statistical Classification , 2009 .

[3] Tom Fawcett,et al. Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[4] Ron Kohavi,et al. The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[5] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[6] Niall M. Adams,et al. Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[7] Tom Fawcett,et al. Robust Classification for Imprecise Environments , 2000, Machine Learning.