Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking

Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. The pair-wise approach for bi-partite ranking construct a quadratic number of pairs to solve the problem, which is infeasible for large-scale data sets. The point-wise approach, albeit more efficient, often results in inferior performance. That is, it is difficult to conduct bipartite ranking accurately and efficiently at the same time. In this paper, we develop a novel active sampling scheme within the pair-wise approach to conduct bipartite ranking efficiently. The scheme is inspired from active learning and can reach a competitive ranking performance while focusing only on a small subset of the many pairs during training. Moreover, we propose a general Combined Ranking and Classification (CRC) framework to accurately conduct bipartite ranking. The framework unifies point-wise and pair-wise approaches and is simply based on the idea of treating each instance point as a pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that the proposed algorithm of Active Sampling within CRC, when coupled with a linear Support Vector Machine, usually outperforms state-of-the-art point-wise and pair-wise ranking approaches in terms of both accuracy and efficiency.

[1]  Anonymous Author Robust Reductions from Ranking to Classification , 2006 .

[2]  Harald Steck,et al.  Hinge Rank Loss and the Area Under the ROC Curve , 2007, ECML.

[3]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[4]  Cynthia Rudin,et al.  On Equivalence Relationships Between Classification and Ranking Algorithms , 2011, J. Mach. Learn. Res..

[5]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[6]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[7]  Jaime G. Carbonell,et al.  Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve , 2009, ECIR.

[8]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Andrew McCallum,et al.  Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Glenn Fung,et al.  Learning Rankings via Convex Hull Separation , 2005, NIPS.

[16]  Jaime G. Carbonell,et al.  Optimizing estimated loss reduction for active sampling in rank learning , 2008, ICML '08.

[17]  D. Roth,et al.  A study of the bipartite ranking problem in machine learning , 2005 .

[18]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[19]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[20]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[21]  Chun-Liang Li,et al.  Active Learning with Hinted Support Vector Machine , 2012, ACML.

[22]  Nir Ailon,et al.  An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity , 2010, J. Mach. Learn. Res..

[23]  Cynthia Rudin,et al.  Margin-based Ranking and an Equivalence between AdaBoost and RankBoost , 2009, J. Mach. Learn. Res..

[24]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[25]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[26]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[27]  Thomas S. Huang,et al.  Diverse Active Ranking for Multimedia Search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  Michael I. Jordan,et al.  On the Consistency of Ranking Algorithms , 2010, ICML.

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Eyke Hüllermeier,et al.  Bipartite Ranking through Minimization of Univariate Loss , 2011, ICML.

[32]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[33]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[34]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[35]  Kuan-Wei Wu,et al.  A Two-Stage Ensemble of Diverse Models for Advertisement Ranking in KDD Cup 2012 , 2012 .

[36]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[37]  Mehryar Mohri,et al.  An Efficient Reduction of Ranking to Classification , 2007, COLT.

[38]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[39]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[40]  Thorsten Joachims,et al.  KDD-Cup 2004: results and analysis , 2004, SKDD.

[41]  D. Sculley,et al.  Combined regression and ranking , 2010, KDD.

[42]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .