论文信息 - Using titles and category names from editor-driven taxonomies for automatic evaluation - 字舞流文

Using titles and category names from editor-driven taxonomies for automatic evaluation

Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the door to a new evaluation approach. We use the ODP (Open Directory Project) taxonomy to find sets of pseudo-relevant documents via one of two assumptions: 1) taxonomy entries are relevant to a given query if their editor-entered titles exactly match the query, or 2) all entries in a leaf-level taxonomy category are relevant to a given query if the category title exactly matches the query. We compare and contrast these two methodologies by evaluating six web search engines on a sample from an America Online log of ten million web queries, using MRR measures for the first method and precision-based measures for the second. We show that this technique is stable with respect to the query set selected and correlated with a reasonably large manual evaluation.

Abdur Chowdhury | Eric C. Jensen | Steven M. Beitzel | David A. Grossman | Abdur Chowdhury | D. Grossman | S. Beitzel

[1] Amit Singhal,et al. A case study in web search using TREC algorithms , 2001, WWW '01.

[2] Peter Bailey,et al. Is it fair to evaluate Web systems using TREC ad hoc methods , 1999, SIGIR 1999.

[3] Ophir Frieder,et al. Using manually-built web directories for automatic evaluation of known-item retrieval , 2003, SIGIR.

[4] Jaideep Srivastava,et al. First 20 precision among World Wide Web search services (search engines) , 1999 .

[5] Gary Marchionini,et al. A Comparative Study of Web Search Service Performance , 1996 .

[6] David Hawking,et al. Overview of the TREC-2001 Web track , 2002 .

[7] Filippo Menczer,et al. Semi-Supervised Evaluation of Search Engines via Semantic Mapping , 2003 .

[8] Dayne Freitag,et al. A Machine Learning Architecture for Optimizing Web Search Engines , 1999 .

[9] Abdur Chowdhury,et al. Automatic evaluation of world wide web search services , 2002, SIGIR '02.

[10] Monika Henzinger,et al. Analysis of a very large web search engine query log , 1999, SIGF.

[11] David Hawking,et al. Overview of the TREC-2002 Web Track , 2002, TREC.

[12] David Hawking,et al. Which Search Engine is Best at Finding Online Services? , 2001, WWW Posters.

[13] Donna K. Harman,et al. Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[14] Ellen M. Voorhees,et al. Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[15] Longzhuang Li,et al. Precision Evaluation of Search Engines , 2004, World Wide Web.

[16] Dan Klein,et al. Evaluating strategies for similarity search on the web , 2002, WWW '02.

[17] Peter Bailey,et al. Overview of the TREC-8 Web Track , 2000, TREC.

[18] David Hawking,et al. Which search engine is best at finding airline site home pages , 2001 .

[19] Andrew Turpin,et al. Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[20] Michael D. Gordon,et al. Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[21] Ellen M. Voorhees,et al. Evaluation by highly relevant documents , 2001, SIGIR '01.

[22] David Hawking,et al. Overview of TREC-7 Very Large Collection Track , 1997, TREC.

[23] Amanda Spink,et al. Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[24] Amanda Spink,et al. From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[25] Peter Bruza,et al. Interactive Internet search: keyword, directory and query reformulation mechanisms compared , 2000, SIGIR '00.

[26] Andrei Broder,et al. A taxonomy of web search , 2002, SIGF.

[27] Peter Bailey,et al. Measuring Search Engine Quality , 2001, Information Retrieval.

[28] Longzhuang Li,et al. A new method for automatic performance comparison of search engines , 2004, World Wide Web.

[29] Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[30] Thorsten Joachims,et al. Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.