Overview of the Topical Classification Task at NTCIR-4 WEB

This paper gives an overview of the Topical Classification Task 1 that was conducted from 2003 to 2004 as one of the pilot experiments of the WEB Task at the Fourth NTCIR Workshop (‘NTCIR-4 WEB’). In this Topical Classification Task, we attempted to assess the effectiveness of automatic classification systems for retrieved documents from Web search engine systems from a viewpoint of topical relevance. Here we use the “Topical Classification” as a general term, and so various techniques, such as text categorization or document clustering, can be ways of creating classification of the documents. For the classification task we used a target data set comprising ranked lists of search result documents from 100-gigabyte document data, which were mainly gathered from the ‘.jp’ domain. We carried out an evaluation of automatic classification systems on the basis of the information retrieval task. We applied several evaluation measures that are often used in information retrieval evaluation. We also proposed new evaluation measures considering the number of classes.