Considering Cost Asymmetry in Learning Classifiers

Receiver Operating Characteristic (ROC) curves are a standard way to display the performance of a set of binary classifiers for all feasible ratios of the costs associated with false positives and false negatives. For linear classifiers, the set of classifiers is typically obtained by training once, holding constant the estimated slope and then varying the intercept to obtain a parameterized set of classifiers whose performances can be plotted in the ROC plane. We consider the alternative of varying the asymmetry of the cost function used for training. We show that the ROC curve obtained by varying both the intercept and the asymmetry, and hence the slope, always outperforms the ROC curve obtained by varying only the intercept. In addition, we present a path-following algorithm for the support vector machine (SVM) that can compute efficiently the entire ROC curve, and that has the same computational complexity as training a single classifier. Finally, we provide a theoretical analysis of the relationship between the asymmetric cost model assumed when training a classifier and the cost model assumed in applying the classifier. In particular, we show that the mismatch between the step function used for testing and its convex upper bounds, usually used for training, leads to a provable and quantifiable difference around extreme asymmetries.

[1]  Gene H. Golub,et al.  Matrix computations , 1983 .

[2]  David Avis,et al.  How good are convex hull algorithms? , 1995, SCG '95.

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[5]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[8]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[9]  Daphne Koller,et al.  Restricted Bayes Optimal Classifiers , 2000, AAAI/IAAI.

[10]  Margaret S. Pepe,et al.  Receiver Operating Characteristic Methodology , 2000 .

[11]  I. Maros Computational Techniques of the Simplex Method , 2002 .

[12]  Michael I. Jordan,et al.  Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates , 2003, NIPS.

[13]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[14]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[15]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[18]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[19]  G. Nemes Asymptotic Expansions of Integrals , 2004 .

[20]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[21]  Eric Horvitz,et al.  On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers , 2005, AISTATS.

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Katya Scheinberg,et al.  An Efficient Implementation of an Active Set Method for SVMs , 2006, J. Mach. Learn. Res..