FSMRank: Feature Selection Algorithm for Learning to Rank

In recent years, there has been growing interest in learning to rank. The introduction of feature selection into different learning problems has been proven effective. These facts motivate us to investigate the problem of feature selection for learning to rank. We propose a joint convex optimization formulation which minimizes ranking errors while simultaneously conducting feature selection. This optimization formulation provides a flexible framework in which we can easily incorporate various importance measures and similarity measures of the features. To solve this optimization problem, we use the Nesterov's approach to derive an accelerated gradient algorithm with a fast convergence rate O(1/T2). We further develop a generalization bound for the proposed optimization problem using the Rademacher complexities. Extensive experimental evaluations are conducted on the public LETOR benchmark datasets. The results demonstrate that the proposed method shows: 1) significant ranking performance gain compared to several feature selection baselines for ranking, and 2) very competitive performance compared to several state-of-the-art learning-to-rank algorithms.

[1]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[2]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[3]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[4]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[5]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[6]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[7]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[8]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[9]  Ling Shao,et al.  Feature selection under learning to rank model for multimedia retrieve , 2010, ICIMCS '10.

[10]  Franco Scarselli,et al.  SortNet: Learning to Rank by a Neural Preference Function , 2011, IEEE Transactions on Neural Networks.

[11]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[12]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[13]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[14]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[15]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.

[16]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[17]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[18]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[19]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[20]  Feng Pan,et al.  Greedy and Randomized Feature Selection for Web Search Ranking , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[21]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[22]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[23]  Jie Wu,et al.  Sparse Learning-to-Rank via an Efficient Primal-Dual Algorithm , 2013, IEEE Transactions on Computers.

[24]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[25]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[26]  D. Donoho,et al.  Fast Solution of -Norm Minimization Problems When the Solution May Be Sparse , 2008 .

[27]  Yaakov Tsaig,et al.  Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[28]  Ambuj Tewari,et al.  Applications of strong convexity--strong smoothness duality to learning with matrices , 2009, ArXiv.

[29]  W. Bruce Croft,et al.  Feature Selection for Document Ranking using Best First Search and Coordinate Ascent , 2010 .

[30]  Shivani Agarwal,et al.  Generalization Bounds for Ranking Algorithms via Algorithmic Stability , 2009, J. Mach. Learn. Res..

[31]  Agma J. M. Traina,et al.  Improving the ranking quality of medical image retrieval using a genetic feature selection method , 2011, Decis. Support Syst..

[32]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[33]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[34]  Abir Awad Abir Awad , 2022 .

[35]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[36]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[37]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[38]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[39]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[40]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[41]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.