The Very Large Collection and Web Tracks (Preprint version)

Together, the TREC Very Large Collection (VLC) Track and its successor the Web Track have run for seven years, after an initial VLC pre-track. During that time five new test collections have been created, five different types of retrieval task have been studied, a large number of important issues have been addressed, and new methods have been tried, not only for retrieval, but also for test collection construction. Since the Web Track was a natural evolutionary step from the VLC Track, from here on we will refer to them as a single VLC/Web track. The corpora created in support of the track have been distributed to more than 120 organisations world wide; they are clearly being used for evaluation and research purposes well beyond the confines of TREC. Not only that but the Web Track model has been adopted for similar Japanese language evaluations within the context of NTCIR (NII-NACSIS Test Collection for IR Systems, research.nii. ac.jp/ntcir/index-en.html). Each edition of the VLC/Web Track (except the 1996 VLC Pre-track) has already been described in a Track Overview paper in the appropriate TREC Proceedings. [29, 26, 30, 20, 22, 23, 16] This chapter:

[1]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[2]  Hugh E. Williams,et al.  A general-purpose compression scheme for large collections , 2002, TOIS.

[3]  Ernst Gombrich A Little History of the World , 2005 .

[4]  Hugh E. Williams,et al.  Efficient phrase querying with an auxiliary index , 2002, SIGIR '02.

[5]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[6]  Robert H. Zakon,et al.  Hobbes' Internet Timeline , 1997, RFC.

[7]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[8]  Peter Bailey,et al.  Is it fair to evaluate Web systems using TREC ad hoc methods , 1999, SIGIR 1999.

[9]  David Hawking,et al.  Overview of the TREC-9 Web Track , 2000, TREC.

[10]  Hugh E. Williams,et al.  Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[11]  Peter Bailey,et al.  Measuring Search Engine Quality , 2001, Information Retrieval.

[12]  Kathryn S. McKinley,et al.  Partial replica selection based on relevance for information retrieval , 1999, SIGIR '99.

[13]  David Hawking,et al.  Overview of TREC-7 Very Large Collection Track , 1997, TREC.

[14]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[15]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[16]  John Moore,et al.  The Z39.50 information retrieval standard , 2000 .

[17]  Ellen M. Voorhees,et al.  Evaluation by highly relevant documents , 2001, SIGIR '01.

[18]  Claudio Carpineto,et al.  Improving retrieval feedback with multiple term-ranking function combination , 2002, TOIS.

[19]  Alistair Moffat,et al.  Impact transformation: effective and efficient web retrieval , 2002, SIGIR '02.

[20]  Robert Cailliau,et al.  A little history of the World Wide Web , 1995 .

[21]  David Hawking,et al.  Query-independent evidence in home page finding , 2003, TOIS.

[22]  G. Seber,et al.  The estimation of animal abundance and related parameters , 1974 .

[23]  Weiguo Fan,et al.  Getting answers to natural language questions on the Web , 2002, J. Assoc. Inf. Sci. Technol..

[24]  Kevyn Collins-Thompson,et al.  Information Filtering, Novelty Detection, and Named-Page Finding , 2002, TREC.

[25]  Charles L. A. Clarke,et al.  Shortest-substring retrieval and ranking , 2000, TOIS.

[26]  David Hawking,et al.  Challenges in Enterprise Search , 2004, ADC.

[27]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[28]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[29]  Djoerd Hiemstra,et al.  Retrieving Web Pages Using Content, Links, URLs and Anchors , 2001, TREC.

[30]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[31]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[32]  Ophir Frieder,et al.  Collection statistics for fast duplicate document detection , 2002, TOIS.

[33]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[34]  Prabhakar Raghavan,et al.  Navigating large-scale semi-structured data in business portals , 2001, VLDB.

[35]  Amit Singhal,et al.  A case study in web search using TREC algorithms , 2001, WWW '01.

[36]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[37]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[38]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[39]  Stephen E. Robertson,et al.  On Collection Size and Retrieval Effectiveness , 2004, Information Retrieval.