The IR group participated in the crosslanguage retrieval task (CLIR) at the sixth NTCIR workshop (NTCIR 6). In this paper, we describe our approach on Chinese Single Language Information Retrieval (SLIR) task and English-Chinese Bilingual CLIR task (BLIR). We use both bi-grams and single Chinese characters as index units and use OKAPI BM25 as retrieval model. The initial retrieved documents are reranked before they are used to do standard query expansion. Our document re-ranking method is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo labeled documents from the ranking list of the initial retrieval. For pseudo relevant documents, we determine a cluster of documents from the top ones via cluster validation-based k-means clustering; for pseudo irrelevant ones, we pick a set of documents from the bottom ones. Then the ranking of the documents can be conducted via label propagation. For Chinese SLIR task, experiences show our method achieves 0.3097, 0.4013 mean average precision on T-only run (Title based) at rigid, relax relevant judgment and 0.3136, 0.4071 mean average precision on D-only run (short description based) at rigid, relax relevant judgment. For English-Chinese BLIR task, experiences show our method achieves 0.2013, 0.2931 mean average precision on T-only run at rigid, relax relevant judgment and 0.1911, 0.2804 mean average precision on D-only run at rigid, relax relevant judgment.
[1]
Gerard Salton,et al.
Improving retrieval performance by relevance feedback
,
1997,
J. Am. Soc. Inf. Sci..
[2]
Stephen E. Robertson,et al.
GatfordCentre for Interactive Systems ResearchDepartment of Information
,
1996
.
[3]
J. J. Rocchio,et al.
Relevance feedback in information retrieval
,
1971
.
[4]
Guodong Zhou,et al.
Document re-ranking using cluster validation and label propagation
,
2006,
CIKM '06.
[5]
Dong-Hong Ji,et al.
Chinese Information Retrieval Based on Terms and Ontology
,
2004,
NTCIR.
[6]
Dong-Hong Ji,et al.
Document clustering based on cluster validation
,
2004,
CIKM '04.
[7]
Stephen E. Robertson,et al.
On Term Selection for Query Expansion
,
1991,
J. Documentation.
[8]
Dong-Hong Ji,et al.
Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning
,
2005,
ACL.
[9]
Hsin-Hsi Chen,et al.
Overview of CLIR Task at the Sixth NTCIR Workshop
,
2005,
NTCIR.