Exploring reductions for long web queries

Long queries form a difficult, but increasingly important segment for web search engines. Query reduction, a technique for dropping unnecessary query terms from long queries, improves performance of ad-hoc retrieval on TREC collections. Also, it has great potential for improving long web queries (upto 25% improvement in NDCG@5). However, query reduction on the web is hampered by the lack of accurate query performance predictors and the constraints imposed by search engine architectures and ranking algorithms. In this paper, we present query reduction techniques for long web queries that leverage effective and efficient query performance predictors. We propose three learning formulations that combine these predictors to perform automatic query reduction. These formulations enable trading of average improvements for the number of queries impacted, and enable easy integration into the search engine's architecture for rank-time query reduction. Experiments on a large collection of long queries issued to a commercial search engine show that the proposed techniques significantly outperform baselines, with more than 12% improvement in NDCG@5 in the impacted set of queries. Extension to the formulations such as result interleaving further improves results. We find that the proposed techniques deliver consistent retrieval gains where it matters most: poorly performing long web queries.

[1]  Pu-Jen Cheng,et al.  Selecting Effective Terms for Query Formulation , 2009, AIRS.

[2]  Ricardo Baeza-Yates,et al.  Improved query difficulty prediction for the web , 2008, CIKM '08.

[3]  James Allan,et al.  A Case For Shorter Queries, and Helping Users Create Them , 2007, NAACL.

[4]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[5]  Matthew Lease An improved markov random field model for supporting verbose queries , 2009, SIGIR.

[6]  Yanqing Zhang,et al.  A Query Substitution-Search Result Refinement Approach for Long Query Web Searches , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[7]  Pu-Jen Cheng,et al.  A term dependency-based approach for query terms ranking , 2009, CIKM.

[8]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[9]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[10]  James Allan,et al.  Regression Rank: Learning to Meet the Opportunity of Descriptive Queries , 2009, ECIR.

[11]  N. Given,et al.  Predicting query performance on the web , 2010, SIGIR.

[12]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[13]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[14]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[15]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[16]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[17]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.