Turning the hyperparameter of an AUC-optimized classifier

The Area under the ROC curve (AUC) is a good alternative to the standard empirical risk (classification error) as a performance criterion for classifiers. While most classifier formulations aim at minimizing the classification error, few methods exist that directly optimize the AUC. Moreover, the reported methods that optimize the AUC are often not efficient even for moderately sized datasets. In this paper, we discuss a classifier that optimizes the AUC using a linear programming formulation such that classification constraints can easily be subsampled. Furthermore, this approach enables to use the non-used constraints to for the optimization of hyperparameters. In a performance evaluation, we compare the AUC-optimized Linear Programming Classifier (AUC-LPC) to other classifiers on several real world datasets.

[1]  C. Ling,et al.  Decision Tree with Better Ranking , 2003, ICML.

[2]  Peter A. Flach,et al.  Improving Accuracy and Cost of Two-class and Multi-class Probabilistic Classifiers Using ROC Curves , 2003, ICML.

[3]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[4]  Tom Fawcett,et al.  Robust Classification Systems for Imprecise Environments , 1998, AAAI/IAAI.

[5]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[6]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[9]  Michael C. Mozer,et al.  Optimizing Classifier Performance Via the Wilcoxon-Mann-Whitney Statistic , 2003, ICML 2003.

[10]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[11]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[14]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Michael I. Jordan,et al.  Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data , 2003, Signal Process..

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[21]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[22]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[23]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[24]  Charles X. Ling,et al.  AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[25]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[27]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[28]  David G. Stork,et al.  Pattern Classification , 1973 .

[29]  Kaan Ataman,et al.  Optimizing Area Under the ROC Curve using Ranking SVMs , 2005 .

[30]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.