Web search solved?: all result rankings the same?

The objective of this work is to derive quantitative statements about what fraction of web search queries issued to the state-of-the-art commercial search engines lead to excellent results or, on the contrary, poor results. To be able to make such statements in an automated way, we propose a new measure that is based on lower and upper bound analysis over the standard relevance measures. Moreover, we extend this measure to carry out comparisons between competing search engines by introducing the concept of disruptive sets, which we use to estimate the degree to which a search engine solves queries that are not solved by its competitors. We report empirical results on a large editorial evaluation of the three largest search engines in the US market.

[1]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[2]  Sergei Vassilvitskii,et al.  Generalized distances between rankings , 2010, WWW '10.

[3]  Longzhuang Li,et al.  A new method for automatic performance comparison of search engines , 2004, World Wide Web.

[4]  S. Pollock Measures for the comparison of information retrieval systems , 1968 .

[5]  Mike Thelwall,et al.  Search engine coverage bias: evidence and possible causes , 2004, Inf. Process. Manag..

[6]  Rabia Nuray-Turan,et al.  Automatic ranking of information retrieval systems using data fusion , 2006, Inf. Process. Manag..

[7]  Rabia Nuray-Turan,et al.  Automatic performance evaluation of Web search engines , 2004, Inf. Process. Manag..

[8]  Dirk Lewandowski,et al.  The Retrieval Effectiveness of Web Search Engines: Considering Results Descriptions , 2008, J. Documentation.

[9]  Abbe Mowshowitz,et al.  Measuring search engine bias , 2005, Inf. Process. Manag..

[10]  Ophir Frieder,et al.  A framework for determining necessary query set sizes to evaluate web search effectiveness , 2005, WWW '05.

[11]  Pei-Luen Patrick Rau,et al.  Relevance Measurement on Chinese Search Results , 2007, HCI.

[12]  Ben Carterette,et al.  On rank correlation and the distance between rankings , 2009, SIGIR.

[13]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[14]  Peter Ingwersen,et al.  Measures of relative relevance and ranked half-life: performance indicators for interactive IR , 1998, SIGIR '98.

[15]  Robert M. Losee Text retrieval and filtering: analytic models of performance , 1998 .

[16]  Kenneth Ward Church,et al.  Entropy of search logs: how hard is search? with personalization? with backoff? , 2008, WSDM '08.

[17]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[18]  Daniel Gayo-Avello,et al.  A survey on session detection methods in query logs and a proposal for future evaluation , 2009, Inf. Sci..

[19]  Judit Bar-Ilan,et al.  User rankings of search engine results , 2007, J. Assoc. Inf. Sci. Technol..

[20]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[21]  Hitoshi Isahara,et al.  Improving Search Performance: A Lesson Learned from Evaluating Search Engines Using Thai Queries , 2007, IEICE Trans. Inf. Syst..

[22]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[23]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[24]  M. M. Sufyan Beg A subjective measure of web search quality , 2005, Inf. Sci..

[25]  Dirk Lewandowski,et al.  A three-year study on the freshness of web search engine databases , 2008, J. Inf. Sci..

[26]  Ronald Fagin,et al.  Comparing Partial Rankings , 2006, SIAM J. Discret. Math..

[27]  Amanda Spink,et al.  Defining a session on Web search engines: Research Articles , 2007 .

[28]  Efthimis N. Efthimiadis,et al.  Results overlap among the GYM (Google-Yahoo-MSN live) search engines , 2009, ASIST.

[29]  Abdur Chowdhury,et al.  Automatic evaluation of world wide web search services , 2002, SIGIR '02.

[30]  Dirk Lewandowski,et al.  What Users See - Structures in Search Engine Results Pages , 2009, Inf. Sci..

[31]  Byeong Ho Kang,et al.  Coverage and Timeliness Analysis of Search Engines with Webpage Monitoring Results , 2007, WISE.

[32]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[33]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[34]  Amanda Spink,et al.  A study of results overlap and uniqueness among major Web search engines , 2006, Inf. Process. Manag..

[35]  Longzhuang Li,et al.  Precision Evaluation of Search Engines , 2004, World Wide Web.

[36]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[37]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[38]  Alfred Glossbrenner,et al.  Search engines for the World Wide Web , 1997 .

[39]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[40]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[41]  Judit Bar-Ilan,et al.  Comparing rankings of search results on the Web , 2005, Inf. Process. Manag..

[42]  James Allan,et al.  When will information retrieval be "good enough"? , 2005, SIGIR '05.

[43]  Gary Marchionini,et al.  A Comparative Study of Web Search Service Performance , 1996 .

[44]  Ellen M. Voorhees,et al.  Retrieval System Evaluation , 2005 .

[45]  Peter Bailey,et al.  Measuring Search Engine Quality , 2001, Information Retrieval.

[46]  Judit Bar-Ilan,et al.  Methods for comparing rankings of search engine results , 2005, Comput. Networks.

[47]  Mark Sanderson,et al.  The relationship between IR effectiveness measures and user satisfaction , 2007, SIGIR.

[48]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[49]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[50]  Dirk Lewandowski,et al.  The retrieval effectiveness of search engines on navigational queries , 2011, Aslib Proc..

[51]  Mike Thelwall Quantitative comparisons of search engine results , 2008 .

[52]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[53]  Catherine L. Smith,et al.  User adaptation: good results from poor systems , 2008, SIGIR '08.

[54]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[55]  Peter Bailey,et al.  Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.