We participate in document search and expert search of Enterprise Track in TREC2008. The corpus and tasks are same as the year before. Different from TREC 2007, the topics come from CSIRO Enquiries, and the topic statements are richer and more colloquial.. In document search, we look into the key resource page pre-selection, the use of anchor text, query classification, and multi-field search. In expert search, we develop methods to detect expert identifiers and experimented based on our previous PDD (personal description documents) model. This is the fourth year that the IR groups of Tsinghua University participated in TREC Enterprise Track. Different from TREC 2007, the topics come from CSIRO Enquiries, and the topic statements are richer and more colloquial. The approaches we've studied this year include the use of anchor text, person entity identification, topic distillation with key resource pre-selection, query classification and multi-field search. For document search task, we mainly investigate the effects of key source pre-selection and the use of anchor text. We first observe the high quality resource distribution. Some features are studied to find overview pages. We also do some link analysis: both HITS and PageRank algorithms are employed to evaluate the page quality. Besides, we attempted a novel link analysis method which involved the document similarity. For expert finding task, a lot of efforts have been made on name identification. We built personal description documents (PDD) for each candidate from various types of resources. We obtain retrieved results from each description document collection.
[1]
Bo Zhang,et al.
Probabilistic model supported rank aggregation for the semantic concept detection in video
,
2007,
CIVR '07.
[2]
Rajeev Motwani,et al.
The PageRank Citation Ranking : Bringing Order to the Web
,
1999,
WWW 1999.
[3]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[4]
Yiqun Liu,et al.
THUIR at TREC 2005: Enterprise Track
,
2005,
TREC.