论文信息 - Finding "Abstract Fields" of Web Pages and Query Specific Retrieval - THUIR at TREC 2004 Web Track

Finding "Abstract Fields" of Web Pages and Query Specific Retrieval - THUIR at TREC 2004 Web Track

In this year's TREC Web Track research, THUIR participated in the Mixed-query Task. This task involves a single query set comprising 3 kinds of queries (Homepage Finding, Named Page Finding and Topic distillation) which are mixed and unlabelled. Efforts have been made on two directions: to find a strong and robust unified approach which works well for all kinds of queries, and to build a query-specific retrieval strategy that classifies queries by types and perform specific approaches. The using of non-content information has been studied in both approaches. With topic distillation and navigational search tasks in the last year, we are able to build a training set with 150 topics and corresponding relevant qrels. This training set is used to evaluate effectiveness of different methods in mixed query search. Experiments in section 2, 3 and 4 are all based on this set.

Yiqun Liu | Min Zhang | Shaoping Ma | Canhui Wang

[1] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[2] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[3] Min Zhang,et al. DF or IDF? On the Use of HTML Primary Feature Fields for Web IR , 2003, WWW.

[4] Yiqun Liu,et al. Effective Topic Distillation with Key Resource Pre-selection , 2004, AIRS.

[5] Yiqun Liu,et al. THUIR at TREC 2003: Novelty, Robust and Web , 2003, TREC.