Query dependent ranking using K-nearest neighbor

Many ranking models have been proposed in information retrieval, and recently machine learning techniques have also been applied to ranking model construction. Most of the existing methods do not take into consideration the fact that significant differences exist between queries, and only resort to a single function in ranking of documents. In this paper, we argue that it is necessary to employ different ranking models for different queries and onduct what we call query-dependent ranking. As the first such attempt, we propose a K-Nearest Neighbor (KNN) method for query-dependent ranking. We first consider an online method which creates a ranking model for a given query by using the labeled neighbors of the query in the query feature space and then rank the documents with respect to the query using the created model. Next, we give two offline approximations of the method, which create the ranking models in advance to enhance the efficiency of ranking. And we prove a theory which indicates that the approximations are accurate in terms of difference in loss of prediction, if the learning algorithm used is stable with respect to minor changes in training examples. Our experimental results show that the proposed online and offline methods both outperform the baseline method of using a single ranking function.

[1]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[2]  Ophir Frieder,et al.  Varying approaches to topical web query classification , 2007, SIGIR.

[3]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[4]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[5]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[6]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[7]  H. S. Nagendraswamy,et al.  Clustering of Interval-Valued Symbolic Patterns Based on Mutual Similarity Value and the Concept of k-Mutual Nearest Neighborhood , 2006, ACCV.

[8]  Ophir Frieder,et al.  Automatic web query classification using labeled and unlabeled training data , 2005, SIGIR '05.

[9]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[10]  Shivani Agarwal,et al.  Stability and Generalization of Bipartite Ranking Algorithms , 2005, COLT.

[11]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[12]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[13]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[14]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[15]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[16]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[17]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[18]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[19]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[20]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[21]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[22]  Tao Qin,et al.  Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004 , 2004, TREC.

[23]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[24]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[25]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.