Surrogate Functions for Maximizing Precision at the Top

The problem of maximizing precision at the top of a ranked list, often dubbed [email protected] ([email protected]), finds relevance in myriad learning applications such as ranking, multi-label classification, and learning with severe label imbalance. However, despite its popularity, there exist significant gaps in our understanding of this problem and its associated performance measure. The most notable of these is the lack of a convex upper bounding surrogate for [email protected] We also lack scalable perceptron and stochastic gradient descent algorithms for optimizing this performance measure. In this paper we make key contributions in these directions. At the heart of our results is a family of truly upper bounding surrogates for [email protected] These surrogates are motivated in a principled manner and enjoy attractive properties such as consistency to [email protected] under various natural margin/noise conditions. These surrogates are then used to design a class of novel perceptron algorithms for optimizing [email protected] with provable mistake bounds. We also devise scalable stochastic gradient descent style methods for this problem with provable convergence bounds. Our proofs rely on novel uniform convergence bounds which require an in-depth analysis of the structural properties of [email protected] and its surrogates. We conclude with experimental results comparing our algorithms with state-of-the-art cutting plane and stochastic gradient algorithms for maximizing [email protected]

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[3]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[4]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[5]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[6]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[7]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[8]  Gábor Lugosi,et al.  Concentration Inequalities , 2008, COLT.

[9]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[10]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[11]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[12]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[13]  Alexander J. Smola,et al.  Direct Optimization of Ranking Measures , 2007, ArXiv.

[14]  Stéphan Clémençon,et al.  Ranking the Best Instances , 2006, J. Mach. Learn. Res..

[15]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[16]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[17]  Chiranjib Bhattacharyya,et al.  Structured learning for non-smooth ranking losses , 2008, KDD.

[18]  Alexander J. Smola,et al.  Tighter Bounds for Structured Estimation , 2008, NIPS.

[19]  Cynthia Rudin,et al.  The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..

[20]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[21]  Shivani Agarwal,et al.  The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List , 2011, SDM.

[22]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[23]  Stephen P. Boyd,et al.  Accuracy at the Top , 2012, NIPS.

[24]  Patrick Gallinari,et al.  "On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking" , 2012, NIPS.

[25]  Harikrishna Narasimhan,et al.  A Structural SVM Based Approach for Optimizing Partial AUC , 2013, ICML.

[26]  Harikrishna Narasimhan,et al.  SVMpAUCtight: a new support vector method for optimizing partial AUC based on a tight convex upper bound , 2013, KDD.

[27]  S. V. N. Vishwanathan,et al.  Ranking via Robust Binary Classification , 2014, NIPS.

[28]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[29]  Ambuj Tewari,et al.  Perceptron-like Algorithms and Generalization Bounds for Learning to Rank , 2014, ArXiv.

[30]  Rong Jin,et al.  Top Rank Optimization in Linear Time , 2014, NIPS.

[31]  Prateek Jain,et al.  Online and Stochastic Gradient Methods for Non-decomposable Loss Functions , 2014, NIPS.

[32]  Prateek Jain,et al.  Optimizing Non-decomposable Performance Measures: A Tale of Two Classes , 2015, ICML.

[33]  Ambuj Tewari,et al.  Online Ranking with Top-1 Feedback , 2014, AISTATS.