Improving Tail Query Performance by Fusion Model

Tail queries, which occur with low frequency, make up a large fraction of unique queries and often affect a user's experience during Web searching. Because of the data sparseness problem, information that can be leveraged for tail queries is not sufficient. Hence, it is important and difficult to improve the tail query performance. According to our observation, 26% of the tail queries are not essentially scarce: they are expressed in an unusual way, but the information requirements are not rare. In this study, we improve the tail query performance by fusing the results from original query and the query reformulation candidates. Other than results re-ranking, new results can be introduced by the fusion model. We emphasize that queries that can be improved are not only bad queries, and we propose to extract features that predict whether the performance can be improved. Then, we utilize a learning-to-rank method, which is trained to directly optimize a retrieval metric, to fuse the documents and obtain a final results list. We conducted experiments using data from two popular Chinese search engines. The results indicate that our fusion method significantly improves the performance of the tail queries and outperforms the state-of-the-art approaches on the same reformulations. Experiments show that our method is effective for the non-tail queries as well.

[1]  Doug Downey,et al.  Understanding the relationship between searchers' queries and information goals , 2008, CIKM '08.

[2]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[3]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[4]  Andrei Z. Broder,et al.  Anatomy of the long tail: ordinary people with extraordinary tastes , 2010, WSDM '10.

[5]  Marcus Fontoura,et al.  Estimating advertisability of tail queries for sponsored search , 2010, SIGIR.

[6]  Andrei Z. Broder,et al.  Online expansion of rare queries for sponsored search , 2009, WWW '09.

[7]  Shai Fine,et al.  Metasearch and Federation using Query Difficulty Prediction , 2005 .

[8]  Fernando Diaz,et al.  Performance prediction using spatial autocorrelation , 2007, SIGIR.

[9]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[10]  Xin Li,et al.  Collaborative ranking: improving the relevance for tail queries , 2012, CIKM '12.

[11]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[12]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[13]  Yang Liu,et al.  Adaptive query suggestion for difficult queries , 2012, SIGIR '12.

[14]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[15]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[16]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[17]  Djoerd Hiemstra,et al.  A survey of pre-retrieval query performance predictors , 2008, CIKM '08.

[18]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[19]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[20]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[21]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[22]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.

[23]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.

[24]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[25]  Fabrizio Silvestri,et al.  Recommendations for the long tail by term-query graph , 2011, WWW.

[26]  Elad Yom-Tov,et al.  What makes a query difficult? , 2006, SIGIR.

[27]  Berkant Barla Cambazoglu,et al.  Web search solved?: all result rankings the same? , 2010, CIKM '10.

[28]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[29]  Ryen W. White,et al.  Predicting query performance using query, result, and user interaction features , 2010, RIAO.

[30]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[31]  Yiqun Liu,et al.  Empirical Study on Rare Query Characteristics , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[32]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[33]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[34]  Doug Downey,et al.  Heads and tails: studies of web search with common and rare queries , 2007, SIGIR.

[35]  N. Given,et al.  Predicting query performance on the web , 2010, SIGIR.

[36]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[37]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[38]  Yang Song,et al.  Optimal rare query suggestion with implicit user feedback , 2010, WWW '10.