An overview of Web search evaluation methods

Web search evaluation is the process of measuring the effectiveness of a Web search system. Such an evaluation helps in identifying the most effective one and helps the users to find the required information with less effort. Web search systems have been evaluated in many different ways in the last 15years. In this paper, we review some of the efforts made for the evaluation of Web search systems. We discuss these evaluation studies by classifying them into eight different categories. As the size and content of Web is changing rapidly, and hence, the Web search techniques, we mention the necessity of an automatic evaluation methodology. But, at the same time, we emphasize that the significance of user based evaluation can not be neglected. Finally, we conclude that an automatic evaluation method that models users' feedback based evaluation is required for the effective and realistic evaluation of Web search systems.

[1]  Ian Winship World‐Wide Web searching tools: an evaluation , 1995 .

[2]  Louise T. Su A comprehensive and systematic model of user evaluation of Web search engines: II. An evaluation by undergraduates , 2003, J. Assoc. Inf. Sci. Technol..

[3]  Amit Singhal,et al.  A case study in web search using TREC algorithms , 2001, WWW '01.

[4]  Noriko Kando,et al.  Overview of the Web Retrieval Task at the Third NTCIR Workshop , 2003, NTCIR.

[5]  Emine Yilmaz,et al.  A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.

[6]  Gary Marchionini,et al.  A Comparative Study of Web Search Service Performance , 1996 .

[7]  M. P. Courtois,et al.  Results-ranking in Web search engines : Search Engine Section , 1999 .

[8]  Giles,et al.  Searching the world wide Web , 1998, Science.

[9]  Weiguo Fan,et al.  Getting answers to natural language questions on the Web , 2002, J. Assoc. Inf. Sci. Technol..

[10]  Peter Willett,et al.  Estimating the recall performance of Web search engines , 1997 .

[11]  Jacek Gwizdka,et al.  Discriminating Meta-Search: A Framework for Evaluation , 1999, Inf. Process. Manag..

[12]  Shengli Wu,et al.  Methods for ranking information retrieval systems without relevance judgments , 2003, SAC '03.

[13]  Nicholas G. Tomaiuolo,et al.  An analysis of Internet search engines: assessment of over 200 search queries , 1996 .

[14]  Francisco Puentes,et al.  A New Performance Evaluation Technique for Web Information Retrieval Systems , 2004, ICWI.

[15]  Ian Soboroff On evaluating web search with very few relevant documents , 2004, SIGIR '04.

[16]  Alan F. Smeaton,et al.  Improving the Evaluation of Web Search Systems , 2003, ECIR.

[17]  Stephen P. Harter,et al.  Evaluation of information retrieval systems : Approaches, issues, and methods , 1997 .

[18]  David Carmel,et al.  Scaling IR-system evaluation using term relevance sets , 2004, SIGIR '04.

[19]  Jaideep Srivastava,et al.  First 20 precision among World Wide Web search services (search engines) , 1999 .

[20]  Joachim Griesbaum,et al.  Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de , 2004, Inf. Res..

[21]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[22]  Rashid Ali,et al.  Automatic Performance Evaluation of Web Search Systems using Rough Set based Rank Aggregation , 2009, IHCI.

[23]  Amanda Spink,et al.  A user-centered approach to evaluating human interaction with Web search engines: an exploratory study , 2002, Inf. Process. Manag..

[24]  John Tait,et al.  Evaluation of information-seeking performance in hypermedia digital libraries , 1998, Interact. Comput..

[25]  Peter Bailey,et al.  Measuring Search Engine Quality , 2001, Information Retrieval.

[26]  Diane Nahl,et al.  Ethnography of novices' first use of Web search engines: affective control in cognitive processing , 1998 .

[27]  Judit Bar-Ilan Methods for measuring search engine performance over time , 2002, J. Assoc. Inf. Sci. Technol..

[28]  Judit Bar-Ilan Criteria for Evaluating Information Retrieval Systems in Highly Dynamic Environments , 2002, WebDyn@WWW.

[29]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[30]  Stephen P. Harter Variations in relevance assessments and the measurement of retrieval effectiveness , 1996 .

[31]  Paul Nieuwenhuysen,et al.  Internet search engines - fluctuations in document accessibility , 2001, J. Documentation.

[32]  Longzhuang Li,et al.  Precision Evaluation of Search Engines , 2004, World Wide Web.

[33]  Ziv Bar-Yossef,et al.  Efficient search engine measurements , 2007, WWW '07.

[34]  Abdur Chowdhury,et al.  Automatic evaluation of world wide web search services , 2002, SIGIR '02.

[35]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[36]  Kotagiri Ramamohanarao,et al.  Guidelines for presentation and comparison of indexing techniques , 1996, SGMD.

[37]  Charles Oppenheim,et al.  The evaluation of WWW search engines , 2000, J. Documentation.

[38]  Marc Najork,et al.  Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.

[39]  Longzhuang Li,et al.  A new statistical method for performance evaluation of search engines , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[40]  Bernard J. Jansen,et al.  The effectiveness of Web search engines for retrieving relevant ecommerce links , 2006, Inf. Process. Manag..

[41]  Louise T. Su Value of Search Results as a Whole as the Best Single Measure of Information Retrieval Performance , 1998, Inf. Process. Manag..

[42]  Marek Sroka Web Search Engines for Polish Information Retrieval: Questions of Search Capabilities and Retrieval Performance , 2000 .

[43]  Owen Williams,et al.  Search Engine Watch , 2005 .

[44]  Frances C. Johnson,et al.  Devise: a framework for the evaluation of internet search engines , 2001 .

[45]  Michael Keen,et al.  ASLIB CRANFIELD RESEARCH PROJECT FACTORS DETERMINING THE PERFORMANCE OF INDEXING SYSTEMS VOLUME 2 , 1966 .

[46]  Mark D. Dunlop Time, relevance and interaction modelling for information retrieval , 1997, SIGIR '97.

[47]  Hsin-Liang Chen,et al.  Evaluation of Web-Based Search Engines from the End-User's Perspective: A Pilot Study , 1998 .

[48]  Víctor Pàmies,et al.  Open Directory Project , 2003 .

[49]  M. M. Sufyan Beg A subjective measure of web search quality , 2005, Inf. Sci..

[50]  Charles T. Meadow,et al.  Text information retrieval systems , 1992 .

[51]  Noriko Kando,et al.  The web retrieval task and its evaluation in the third NTCIR workshop , 2002, SIGIR '02.

[52]  Bernard J. Jansen,et al.  Automated evaluation of search engine performance via implicit user feedback , 2005, SIGIR '05.

[53]  J. Swets Signal detection and recognition by human observers : contemporary readings , 1964 .

[54]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[55]  Bethina Schmitt,et al.  Evaluating and enhancing meta-search performance in digital libraries , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[56]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[57]  Louise T. Su A comprehensive and systematic model of user evaluation of Web search engines: I. Theory and background , 2003, J. Assoc. Inf. Sci. Technol..

[58]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[59]  Bernard J. Jansen,et al.  A review of web searching studies and a framework for future research , 2001 .

[60]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[61]  Liwen Vaughan,et al.  New measurements for search engine evaluation proposed and tested , 2004, Inf. Process. Manag..

[62]  Abdur Chowdhury,et al.  Using titles and category names from editor-driven taxonomies for automatic evaluation , 2003, CIKM '03.

[63]  Rabia Nuray-Turan,et al.  Automatic ranking of retrieval systems in imperfect environments , 2003, SIGIR '03.

[64]  Ziv Bar-Yossef,et al.  Random sampling from a search engine's index , 2006, WWW '06.

[65]  Hsin-Liang Chen,et al.  Evaluation of Web Search Engines by Undergraduate Students. , 1999 .

[66]  Judit Bar-Ilan,et al.  Dynamics of Search Engine Rankings - A Case Study , 2004, WebDyn@WWW.

[67]  Jacek Gwizdka,et al.  Towards Information Retrieval Measures for Evaluation of Web Search Engines , 1999 .

[68]  Javed A. Aslam,et al.  On the effectiveness of evaluating retrieval systems in the absence of relevance judgments , 2003, SIGIR.

[69]  Abbe Mowshowitz,et al.  Assessing bias in search engines , 2002, Inf. Process. Manag..

[70]  D. Ellis The effectiveness of information retrieval systems: the need for improved explanatory frameworks , 1984 .

[71]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[72]  Judit Bar-Ilan,et al.  On the overlap, the precision and estimated recall of search engines. A case study of the query “Erdos” , 1998, Scientometrics.

[73]  Candy Schwartz,et al.  Web Search Engines , 1998, J. Am. Soc. Inf. Sci..

[74]  Judit Bar-Ilan Evaluating the stability of the search tools Hotbot and Snap: a case study , 2000, Online Inf. Rev..

[75]  Hao-hua Chu,et al.  Search En-gines for the World Wide Web: A Compara-tive Study and Evaluation Methodology , 1996 .

[76]  Antonio Gulli,et al.  The indexable web is more than 11.5 billion pages , 2005, WWW '05.

[77]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[78]  Rabia Nuray-Turan,et al.  Automatic ranking of information retrieval systems using data fusion , 2006, Inf. Process. Manag..

[79]  Amanda Spink,et al.  U.S. versus European web searching trends , 2002, SIGF.

[80]  Ya-Lan Chuang,et al.  User-Based Evaluations of Search Engines: Hygiene Factors and Motivation Factors , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[81]  Andrei Z. Broder,et al.  Estimating corpus size via queries , 2006, CIKM '06.

[82]  Ian Soboroff,et al.  Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[83]  Rabia Nuray-Turan,et al.  Automatic performance evaluation of Web search engines , 2004, Inf. Process. Manag..

[84]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.