Perturbation based Large Margin Approach for Ranking

The use of the standard hinge loss for structured outputs, for the learning to rank problem, faces two main caveats: (a) the label space, the set of all possible permutations of items to be ranked, is too large, and also less amenable to the usual dynamic-programming based techniques used for structured outputs, and (b) the supervision or training data consists of instances with multiple labels per input, instead of just a single label. The most natural way to deal with such multiple labels leads, unfortunately, to a nonconvex surrogate. In this paper, we propose a general class of perturbation-based surrogates that leverage the large margin approach, and are convex. We show that the standard hinge surrogate for classification actually falls within this class. We also find a surrogate within this class, for the ranking problem, that does not suer from the caveats mentioned above. Indeed, our experiments demonstrate that it performs better than other candidate large margin proposals on both synthetic and real world ranking datasets.

[1]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[2]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[3]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[4]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[5]  Eyke Hüllermeier,et al.  Learning from ambiguously labeled examples , 2005, Intell. Data Anal..

[6]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[7]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.

[8]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[9]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[10]  Tao Qin,et al.  Query-level loss functions for information retrieval , 2008, Inf. Process. Manag..

[11]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[12]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[13]  Quoc V. Le Optimization of Ranking Measures , 2007 .

[14]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15]  Ben Taskar,et al.  Learning from Partial Labels , 2011, J. Mach. Learn. Res..

[16]  O. Chapelle Large margin optimization of ranking measures , 2007 .

[17]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[18]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[19]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[20]  Thierry Denoeux,et al.  Mixture Model Estimation with Soft Labels , 2008, SMPS.

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Patrick Vannoorenberghe,et al.  Partially Supervised Learning by a Credal EM Approach , 2005, ECSQARU.

[23]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.