Ranking Instances by Maximizing the Area under ROC Curve

In recent years, the problem of learning a real-valued function that induces a ranking over an instance space has gained importance in machine learning literature. Here, we propose a supervised algorithm that learns a ranking function, called ranking instances by maximizing the area under the ROC curve (RIMARC). Since the area under the ROC curve (AUC) is a widely accepted performance measure for evaluating the quality of ranking, the algorithm aims to maximize the AUC value directly. For a single categorical feature, we show the necessary and sufficient condition that any ranking function must satisfy to achieve the maximum AUC. We also sketch a method to discretize a continuous feature in a way to reach the maximum AUC as well. RIMARC uses a heuristic to extend this maximization to all features of a data set. The ranking function learned by the RIMARC algorithm is in a human-readable form; therefore, it provides valuable information to domain experts for decision making. Performance of RIMARC is evaluated on many real-life data sets by using different state-of-the-art algorithms. Evaluations of the AUC metric show that RIMARC achieves significantly better performance compared to other similar methods.

[1]  Szymon Jaroszewicz,et al.  Efficient AUC Optimization for Classification , 2007, PKDD.

[2]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[3]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[4]  Chunxia Zhao,et al.  AUC maximization linear classifier based on active learning and its application , 2010, Neurocomputing.

[5]  Peter A. Flach,et al.  Repairing Concavities in ROC Curves , 2005, IJCAI.

[6]  Gábor Lugosi,et al.  Ranking and Scoring Using Empirical Risk Minimization , 2005, COLT.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[9]  Henrik Boström,et al.  Maximizing the Area under the ROC Curve using Incremental Reduced Error Pruning , 2005, ICML 2005.

[10]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[13]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[14]  H. Tunstall-Pedoe,et al.  Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. , 2003, European heart journal.

[15]  Kevin Dowd,et al.  After VAR: The Theory, Estimation, and Insurance Applications of Quantile-Based Risk Measures , 2006 .

[16]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[17]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[18]  Michael C. Mozer,et al.  Optimizing Classifier Performance Via the Wilcoxon-Mann-Whitney Statistic , 2003, ICML 2003.

[19]  J W CONN,et al.  Adrenal Factors in Hypertension , 1958, Circulation.

[20]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[21]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[22]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[23]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[24]  S. Rachev Handbook of heavy tailed distributions in finance , 2003 .

[25]  Peter A. Flach,et al.  ROCCER: A ROC convex hull rule learning algorithm , 2004 .

[26]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[27]  H. Altay Güvenir,et al.  A Discretization Method Based on Maximizing the Area under Receiver Operating Characteristic Curve , 2013, Int. J. Pattern Recognit. Artif. Intell..

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[30]  William Nick Street,et al.  Learning to Rank by Maximizing AUC with Linear Programming , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[31]  Charles X. Ling,et al.  Toward Bayesian Classifiers with Accurate Probabilities , 2002, PAKDD.

[32]  M. Pencina,et al.  General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study , 2008, Circulation.

[33]  Kar-Ann Toh,et al.  Maximizing area under ROC curve for biometric scores fusion , 2008, Pattern Recognit..

[34]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[35]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[36]  Dan Roth,et al.  Learnability of Bipartite Ranking Functions , 2005, COLT.

[37]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[38]  H. Altay Güvenir,et al.  A Discretization Method based on Maximizing the Area Under ROC Curve , 2010 .

[39]  C. Marroccoa,et al.  Maximizing the area under the ROC curve by pairwise feature combination , 2008 .

[40]  Weiguo Fan,et al.  Discovery of context-specific ranking functions for effective information retrieval using genetic programming , 2004, IEEE Transactions on Knowledge and Data Engineering.

[41]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[42]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[43]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[44]  Michael C. Mozer,et al.  Prodding the ROC Curve: Constrained Optimization of Classifier Performance , 2001, NIPS.

[45]  Michèle Sebag,et al.  ROC-Based Evolutionary Learning: Application to Medical Data Mining , 2003, Artificial Evolution.

[46]  Claudio Marrocco,et al.  Exploiting AUC for optimal linear combinations of dichotomizers , 2006, Pattern Recognit. Lett..

[47]  Robert P. W. Duin,et al.  Linear model combining by optimizing the Area under the ROC curve , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[48]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[49]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[50]  Fengxia Wang,et al.  Cost-Sensitive Support Vector Ranking for Information Retrieval , 2010, J. Convergence Inf. Technol..

[51]  Xue-wen Chen,et al.  Combating the Small Sample Class Imbalance Problem Using Feature Selection , 2010, IEEE Transactions on Knowledge and Data Engineering.

[52]  H. A. Guvenir,et al.  Classification by Feature Partitioning , 1996, Machine Learning.