Feature selection for ranking

Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applied to ranking. We argue that because of the striking differences between ranking and classification, it is better to develop different feature selection methods for ranking. To this end, we propose a new feature selection method in this paper. Specifically, for each feature we use its value to rank the training instances, and define the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature. We also define the correlation between the ranking results of two features as the similarity between them. Based on the definitions, we formulate the feature selection issue as an optimization problem, for which it is to find the features with maximum total importance scores and minimum total similarity scores. We also demonstrate how to solve the optimization problem in an efficient way. We have tested the effectiveness of our feature selection method on two information retrieval datasets and with two ranking models. Experimental results show that our method can outperform traditional feature selection methods for the ranking task.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[3]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[6]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[7]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[8]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[9]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[10]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[11]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[12]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[13]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[14]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[15]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[16]  Qiang Yang,et al.  Exploiting the hierarchical structure for link analysis , 2005, SIGIR '05.

[17]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[18]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[19]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[20]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[21]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[22]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[23]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[24]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[25]  Tao Qin,et al.  A study of relevance propagation for web search , 2005, SIGIR '05.

[26]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[27]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[28]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[29]  Lior Wolf,et al.  Combining variable selection with dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[31]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[32]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[33]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.