Information Retrieval Using Label Propagation Based Ranking

The IR group participated in the crosslanguage retrieval task (CLIR) at the sixth NTCIR workshop (NTCIR 6). In this paper, we describe our approach on Chinese Single Language Information Retrieval (SLIR) task and English-Chinese Bilingual CLIR task (BLIR). We use both bi-grams and single Chinese characters as index units and use OKAPI BM25 as retrieval model. The initial retrieved documents are reranked before they are used to do standard query expansion. Our document re-ranking method is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo labeled documents from the ranking list of the initial retrieval. For pseudo relevant documents, we determine a cluster of documents from the top ones via cluster validation-based k-means clustering; for pseudo irrelevant ones, we pick a set of documents from the bottom ones. Then the ranking of the documents can be conducted via label propagation. For Chinese SLIR task, experiences show our method achieves 0.3097, 0.4013 mean average precision on T-only run (Title based) at rigid, relax relevant judgment and 0.3136, 0.4071 mean average precision on D-only run (short description based) at rigid, relax relevant judgment. For English-Chinese BLIR task, experiences show our method achieves 0.2013, 0.2931 mean average precision on T-only run at rigid, relax relevant judgment and 0.1911, 0.2804 mean average precision on D-only run at rigid, relax relevant judgment.