An evaluation of the Web retrieval task at the third NTCIR workshop

We have investigated the evaluation methods for measuring retrieval effectiveness of Web search engine systems, attempting to make them suitable for real Web environment. With this objective, we conducted ‘Web Retrieval Task’ at the Third NTCIR Workshop (‘NTCIR-3 WEB’) from 2001 to 2002 [1, 2, 3]. Using this NTCIR-3 WEB, we built a re-usable test collection that is suitable for evaluating Web search engine systems, and evaluated the retrieval effectiveness of a certain number of Web search engine systems. TREC Web Tracks [4] are well-known workshops that have an objective to research the retrieval of large-scale Web document data. Past TREC Web Tracks have used data sets extracted from ‘the Internet Archive’or pages gathered from the ‘.gov’ domain as document sets. They assessed the relevance only on information given in English. NTCIR-3 WEB was another workshop that has used 100-gigabyte and/or 10gigabyte document data that were mainly gathered from the ‘.jp’ domain. Relevance judgment was performed on the retrieved documents that are written in Japanese or English, partially considering hyperlinks. By considering the hyperlinks, a ‘hub page’ that gives out-links to multiple ‘authority pages’ [5] may be judged as relevant even if these do not include sufficient relevant information in them. 16 groups enrolled to participate in the NTCIR-3 WEB, and seven of these groups submitted run results.