Maximizing the area under the ROC curve by pairwise feature combination

The majority of the available classification systems focus on the minimization of the classification error rate. This is not always a suitable metric specially when dealing with two-class problems with skewed classes and cost distributions. In this case, an effective criterion to measure the quality of a decision rule is the area under the Receiver Operating Characteristic curve (AUC) that is also useful to measure the ranking quality of a classifier as required in many real applications. In this paper we propose a nonparametric linear classifier based on the maximization of AUC. The approach lies on the analysis of the Wilcoxon–Mann–Whitney statistic of each single feature and on an iterative pairwise coupling of the features for the optimization of the ranking of the combined feature. By the pairwise feature evaluation the proposed procedure is essentially different from other classifiers using AUC as a criterion. Experiments performed on synthetic and real data sets and comparisons with previous approaches confirm the effectiveness of the proposed method. 2007 Elsevier Ltd. All rights reserved.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[3]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[4]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[5]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[8]  William Nick Street,et al.  Learning to Rank by Maximizing AUC with Linear Programming , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[9]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[10]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[11]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Francesco Tortorella,et al.  A ROC-based reject rule for dichotomizers , 2005, Pattern Recognit. Lett..

[13]  Robert P. W. Duin,et al.  Is independence good for combining classifiers? , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[14]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[15]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[16]  Cynthia Rudin,et al.  Margin-Based Ranking Meets Boosting in the Middle , 2005, COLT.

[17]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[18]  Rory A. Fisher,et al.  Statistical methods and scientific inference. , 1957 .

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  Michael C. Mozer,et al.  Optimizing Classifier Performance Via the Wilcoxon-Mann-Whitney Statistic , 2003, ICML 2003.

[21]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[22]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[23]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[24]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[25]  Robert P. W. Duin,et al.  PRTools - Version 3.0 - A Matlab Toolbox for Pattern Recognition , 2000 .

[26]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[27]  Claudio Marrocco,et al.  Exploiting AUC for optimal linear combinations of dichotomizers , 2006, Pattern Recognit. Lett..

[28]  Robert P. W. Duin,et al.  Linear model combining by optimizing the Area under the ROC curve , 2006, 18th International Conference on Pattern Recognition (ICPR'06).