Linear feature extraction for ranking

We address the feature extraction problem for document ranking in information retrieval. We then propose LifeRank, a Linear feature extraction algorithm for Ranking. In LifeRank, we regard each document collection for ranking as a matrix, referred to as the original matrix. We try to optimize a transformation matrix, so that a new matrix (dataset) can be generated as the product of the original matrix and a transformation matrix. The transformation matrix projects high-dimensional document vectors into lower dimensions. Theoretically, there could be very large transformation matrices, each leading to a new generated matrix. In LifeRank, we produce a transformation matrix so that the generated new matrix can match the learning to rank problem. Extensive experiments on benchmark datasets show the performance gains of LifeRank in comparison with state-of-the-art feature selection algorithms.

[1]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[2]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[3]  J. Platt,et al.  Constrained Differential Optimization for Neural Networks , 1988 .

[4]  Tie-Yan Liu,et al.  Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space , 2012, NIPS.

[5]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[6]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[7]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[8]  Hiroshi Motoda,et al.  Feature Selection Extraction and Construction , 2002 .

[9]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[10]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[11]  Yong Tang,et al.  FSMRank: Feature Selection Algorithm for Learning to Rank , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Josiane Mothe,et al.  Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[14]  Tie-Yan Liu,et al.  Future directions in learning to rank , 2010, Yahoo! Learning to Rank Challenge.

[15]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[16]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[17]  M. de Rijke,et al.  Deep Learning with Logged Bandit Feedback , 2018, ICLR.

[18]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[19]  G. Arfken Mathematical Methods for Physicists , 1967 .

[20]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[21]  Xueqi Cheng,et al.  Top-k learning to rank: labeling, ranking and evaluation , 2012, SIGIR '12.

[22]  Wook-Shin Han,et al.  Efficient feature weighting methods for ranking , 2009, CIKM.

[23]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[24]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  Donald Metzler,et al.  Automatic feature selection in the markov random field model for information retrieval , 2007, CIKM '07.

[27]  Tie-Yan Liu,et al.  Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.

[28]  Gal Chechik,et al.  Coordinate-descent for learning orthogonal matrices through Givens rotations , 2014, ICML.

[29]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[30]  Ke Wang,et al.  A Cooperative Coevolution Framework for Parallel Learning to Rank , 2015, IEEE Transactions on Knowledge and Data Engineering.

[31]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[32]  Tao Qin,et al.  FRank: a ranking method with fidelity loss , 2007, SIGIR.

[33]  Lior Wolf,et al.  Combining variable selection with dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[35]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[36]  Tie-Yan Liu,et al.  Learning to rank for information retrieval (LR4IR 2007) , 2007, SIGF.

[37]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[38]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[39]  M. Kendall Rank Correlation Methods , 1949 .

[40]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[41]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[42]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[43]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[44]  Feng Pan,et al.  Feature selection for ranking using boosted trees , 2009, CIKM.

[45]  Paolo Rosso,et al.  Expected Divergence Based Feature Selection for Learning to Rank , 2012, COLING.

[46]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[47]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[48]  Chiranjib Bhattacharyya,et al.  Learning on graphs using Orthonormal Representation is Statistically Consistent , 2014, NIPS.

[49]  Balázs Kégl,et al.  Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers , 2013, Machine Learning.

[50]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[51]  Carlo Tomasi,et al.  Singular Value Decomposition , 2021, Encyclopedia of Social Network Analysis and Mining.

[52]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[53]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[54]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[55]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[56]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[57]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[58]  Ismail Sengör Altingövde,et al.  Exploiting Result Diversification Methods for Feature Selection in Learning to Rank , 2014, ECIR.

[59]  Tatsuya Harada,et al.  Probabilistic Partial Canonical Correlation Analysis , 2014, ICML.