Anomaly Ranking as Supervised Bipartite Ranking

The Mass Volume (MV) curve is a visual tool to evaluate the performance of a scoring function with regard to its capacity to rank data in the same order as the underlying density function. Anomaly ranking refers to the unsupervised learning task which consists in building a scoring function, based on unlabeled data, with a MV curve as low as possible at any point. In this paper, it is proved that, in the case where the data generating probability distribution has compact support, anomaly ranking is equivalent to (supervised) bipartite ranking, where the goal is to discriminate between the underlying probability distribution and the uniform distribution with same support. In this situation, the MV curve can be then seen as a simple transform of the corresponding ROC curve. Exploiting this view, we then show how to use bipartite ranking algorithms, possibly combined with random sampling, to solve the MV curve minimization problem. Numerical experiments based on a variety of bipartite ranking algorithms well-documented in the literature are displayed in order to illustrate the relevance of our approach.

[1]  Vanish Talwar,et al.  Ranking anomalies in data centers , 2012, 2012 IEEE Network Operations and Management Symposium.

[2]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[3]  W. Polonik Minimum volume sets and generalized quantile processes , 1997 .

[4]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[5]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[6]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[7]  Robert D. Nowak,et al.  Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[8]  Clayton D. Scott,et al.  Regression Level Set Estimation Via Cost-Sensitive Classification , 2007, IEEE Transactions on Signal Processing.

[9]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[10]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[11]  D. Mason,et al.  Generalized quantile processes , 1992 .

[12]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[13]  Stéphan Clémençon,et al.  Adaptive partitioning schemes for bipartite ranking , 2011, Machine Learning.

[14]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[15]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  Stéphan Clémençon,et al.  An empirical comparison of learning algorithms for nonparametric scoring: the TreeRank algorithm and other methods , 2012, Pattern Analysis and Applications.

[18]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[19]  Cynthia Rudin,et al.  Margin-Based Ranking Meets Boosting in the Middle , 2005, COLT.

[20]  Stéphan Clémençon,et al.  Tree-Based Ranking Methods , 2009, IEEE Transactions on Information Theory.

[21]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[22]  Stéphan Clémençon,et al.  Ranking forests , 2013, J. Mach. Learn. Res..

[23]  Nicolas Vayatis,et al.  R-implementation of the TreeRank algorithm , 2009 .

[24]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[25]  Jérémie Jakubowicz,et al.  Scoring anomalies: a M-estimation formulation , 2013, AISTATS.

[26]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[27]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.