Characterizing, predicting, and handling web search queries that match very few or no results

A non‐negligible fraction of user queries end up with very few or even no matching results in leading commercial web search engines. In this work, we provide a detailed characterization of such queries and show that search engines try to improve such queries by showing the results of related queries. Through a user study, we show that these query suggestions are usually perceived as relevant. Also, through a query log analysis, we show that the users are dissatisfied after submitting a query that match no results at least 88.5% of the time. As a first step towards solving these no‐answer queries, we devised a large number of features that can be used to identify such queries and built machine‐learning models. These models can be useful for scenarios such as the mobile‐ or meta‐search, where identifying a query that will retrieve no results at the client device (i.e., even before submitting it to the search engine) may yield gains in terms of the bandwidth usage, power consumption, and/or monetary costs. Experiments over query logs indicate that, despite the heavy skew in class sizes, our models achieve good prediction quality, with accuracy (in terms of area under the curve) up to 0.95.

[1]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[2]  Elad Yom-Tov,et al.  What makes a query difficult? , 2006, SIGIR.

[3]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[4]  Berkant Barla Cambazoglu,et al.  Web search solved?: all result rankings the same? , 2010, CIKM '10.

[5]  M. de Rijke,et al.  Information Processing and Management Investigating Queries and Search Failures in Academic Search , 2022 .

[6]  Niranjan Balasubramanian,et al.  Exploring reductions for long web queries , 2010, SIGIR.

[7]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[8]  Özgür Ulusoy,et al.  Cache-Based Query Processing for Search Engines , 2012, TWEB.

[9]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[10]  Özgür Ulusoy,et al.  Evolution of web search results within years , 2011, SIGIR '11.

[11]  W. Bruce Croft,et al.  Evaluating verbose query processing techniques , 2010, SIGIR.

[12]  Ryen W. White,et al.  Characterizing and predicting search engine switching behavior , 2009, CIKM.

[13]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[14]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[15]  Ryen W. White,et al.  Mining Historic Query Trails to Label Long and Rare Search Engine Queries , 2010, TWEB.

[16]  Amanda Spink,et al.  Web searcher interaction with the Dogpile.com metasearch engine , 2007, J. Assoc. Inf. Sci. Technol..

[17]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[18]  Ryen W. White,et al.  Enhancing web search by promoting multiple search engine use , 2008, SIGIR '08.

[19]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Doug Downey,et al.  Heads and tails: studies of web search with common and rare queries , 2007, SIGIR.

[22]  Michael L. Nelson,et al.  Search engines and their public interfaces: which apis are the most synchronized? , 2007, WWW '07.

[23]  Özgür Ulusoy,et al.  Characterizing web search queries that match very few or no results , 2012, CIKM '12.

[24]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[25]  Amanda Spink,et al.  Patterns of query reformulation during Web searching , 2009, J. Assoc. Inf. Sci. Technol..

[26]  Hongyuan Zha,et al.  A General Boosting Method and its Application to Learning Ranking Functions for Web Search , 2007, NIPS.