Sparse Learning-to-Rank via an Efficient Primal-Dual Algorithm

Learning-to-rank for information retrieval has gained increasing interest in recent years. Inspired by the success of sparse models, we consider the problem of sparse learning-to-rank, where the learned ranking models are constrained to be with only a few nonzero coefficients. We begin by formulating the sparse learning-to-rank problem as a convex optimization problem with a sparse-inducing $(\ell_1)$ constraint. Since the $(\ell_1)$ constraint is nondifferentiable, the critical issue arising here is how to efficiently solve the optimization problem. To address this issue, we propose a learning algorithm from the primal dual perspective. Furthermore, we prove that, after at most $(O({1\over \epsilon } ))$ iterations, the proposed algorithm can guarantee the obtainment of an $(\epsilon)$-accurate solution. This convergence rate is better than that of the popular subgradient descent algorithm. i.e., $(O({1\over \epsilon^2} ))$. Empirical evaluation on several public benchmark data sets demonstrates the effectiveness of the proposed algorithm: 1) Compared to the methods that learn dense models, learning a ranking model with sparsity constraints significantly improves the ranking accuracies. 2) Compared to other methods for sparse learning-to-rank, the proposed algorithm tends to obtain sparser models and has superior performance gain on both ranking accuracies and training time. 3) Compared to several state-of-the-art algorithms, the ranking accuracies of the proposed algorithm are very competitive and stable.

[1]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[2]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[3]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[4]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[7]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[8]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[9]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[10]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[11]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[14]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[15]  Ryan M. Rifkin,et al.  Value Regularization and Fenchel Duality , 2007, J. Mach. Learn. Res..

[16]  Tao Qin,et al.  Ranking with multiple hyperplanes , 2007, SIGIR.

[17]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[18]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[19]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[20]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[21]  Kim,et al.  A gradient-based optimization algorithm for LASSO , 2008 .

[22]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[23]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[24]  Yaakov Tsaig,et al.  Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[25]  D. Donoho,et al.  Fast Solution of -Norm Minimization Problems When the Solution May Be Sparse , 2008 .

[26]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[27]  Tao Qin,et al.  Robust sparse rank learning for non-smooth ranking measures , 2009, SIGIR 2009.

[28]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[29]  Tie-Yan Liu,et al.  Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.

[30]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[31]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[32]  Yoram Singer,et al.  On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.

[33]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[34]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..