A Study on Pseudo Labeled Document Constructed for Document Re-ranking

Document re-ranking is a middle module in information retrieval system. It’s expected that more relevant documents with query appear in higher rankings, from which automatic query expansion can benefit, and it aims at improving the performance of the entire information retrieval. In this paper, we construct a pseudo labeled document based on pseudo-relevance feedback principle, and discuss about the relationship between performance of document re-ranking and the number of top documents in initial retrieval, the number of key terms from the top documents when constructing a pseudo labeled document. Experiment shows our approach of a pseudo labeled document constructed is greatly helpful to document re-ranking. It is the main contribution in the paper. Moreover, experiment shows the performance of document re-ranking is decreasing as the number of top documents increases; and increasing as the number of key terms from these documents increases.

[1]  Hwanjo Yu General MC: estimating boundary of positive class from small positive data , 2003, Third IEEE International Conference on Data Mining.

[2]  Kam-Fai Wong,et al.  Pseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval , 2004, NTCIR.

[3]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[4]  Czeslaw Danilowicz,et al.  Re-ranking method based on inter-document distances , 2005, Inf. Process. Manag..

[5]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[6]  Key-Sun Choi,et al.  Document Re-ranking Model Using Clusters , 1999 .

[7]  Jaap Kamps,et al.  Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary , 2004, ECIR.

[8]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[9]  See-Kiong Ng,et al.  Learning to Classify Documents with Only a Small Positive Training Set , 2007, ECML.

[10]  Key-Sun Choi,et al.  Re-ranking model based on document clusters , 2001, Inf. Process. Manag..

[11]  Wei Xiong,et al.  Information Retrieval Using PU Learning Based Re-ranking , 2008, NTCIR.

[12]  Tao Tao,et al.  A Mixture Clustering Model for Pseudo Feedback in Information Retrieval , 2004 .

[13]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Yang Lingpeng,et al.  Information Retrieval Using Label Propagation Based Ranking , 2007 .

[15]  Noriko Kando,et al.  Overview of the NTCIR-7 ACLIA IR4QA Task , 2008, NTCIR.

[16]  Jun Wang,et al.  Rerank Method Based on Individual Thesaurus , 2001, NTCIR.

[17]  John Bear,et al.  Using Information Extraction to Improve Document Retrieval , 1998, TREC.

[18]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[21]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.