Relevance Ranking Using Kernels

This paper is concerned with relevance ranking in search, particularly that using term dependency information. It proposes a novel and unified approach to relevance ranking using the kernel technique in statistical learning. In the approach, the general ranking model is defined as a kernel function of query and document representations. A number of kernel functions are proposed as specific ranking models in the paper, including BM25 Kernel, LMIR Kernel, and KL Divergence Kernel. The approach has the following advantages. (1) The (general) model can effectively represent different types of term dependency information and thus can achieve high performance. (2) The model has strong connections with existing models such as BM25 and LMIR. (3) It has solid theoretical background. (4) The model can be efficiently computed. Experimental results on web search dataset and TREC datasets show that the proposed kernel approach outperforms MRF and other baseline methods for relevance ranking.

[1]  Charles L. A. Clarke,et al.  Shortest-substring retrieval and ranking , 2000, TOIS.

[2]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[3]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[6]  Koji Tsuda,et al.  Support vector classifier with asymetric kernel function , 1999, The European Symposium on Artificial Neural Networks.

[7]  André Martins String kernels and similarity measures for information retrieval , 2006 .

[8]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[9]  David Hawking,et al.  Proximity Operators - So Near And Yet So Far , 1995, TREC.

[10]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[11]  Jungi Kim,et al.  Exploiting proximity feature in bigram language model for information retrieval , 2008, SIGIR '08.

[12]  Michel Beigbeder,et al.  An information retrieval model using the fuzzy proximity degree of term occurences , 2005, SAC '05.

[13]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[14]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[15]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[16]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[17]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[18]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[19]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[20]  E. Michael Keen Some aspects of proximity searching in text retrieval systems , 1992, J. Inf. Sci..

[21]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[22]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[23]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[24]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[25]  Vijay V. Raghavan,et al.  Language-modeling kernel based approach for information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[26]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[27]  Charles L. A. Clarke,et al.  Term proximity scoring for ad-hoc retrieval on very large text collections , 2006, SIGIR.

[28]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[29]  Ralf Schenkel,et al.  Proximity-aware scoring for XML retrieval , 2008, SIGIR '08.

[30]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[31]  E. Michael Keen,et al.  The Use of Term position Devices in Ranked output Experiments , 1991, J. Documentation.

[32]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[33]  Charles L. A. Clarke,et al.  Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval , 2005, TREC.

[34]  Xin Li,et al.  Investigation of partial query proximity in web search , 2008, WWW.

[35]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[36]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[37]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[38]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .