Entropy-based clustering for improving document re-ranking

Document re-ranking locates between initial retrieval and query expansion in information retrieval system. In this paper, we propose entropy-based clustering approach for document re-ranking. The value of within-cluster entropy determines whether two classes should be merged, and the value of between-cluster entropy determines how many clusters are reasonable. What to do next is finding a suitable cluster from clustering result to construct pseudo labeled document, and conduct document re-ranking as our previous method. We focus clustering strategy for documents after initial retrieval. Experiment with NTCIR-5 data show that the approach can improve the performance of initial retrieval, and it is helpful for improving the quality of document re-ranking.

[1]  Tao Tao,et al.  A Mixture Clustering Model for Pseudo Feedback in Information Retrieval , 2004 .

[2]  Noriko Kando,et al.  Overview of the NTCIR-7 ACLIA IR4QA Task , 2008, NTCIR.

[3]  Yang Lingpeng,et al.  Information Retrieval Using Label Propagation Based Ranking , 2007 .

[4]  Donghong Ji,et al.  A Study on Pseudo Labeled Document Constructed for Document Re-ranking , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[5]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[6]  Jun Wang,et al.  Rerank Method Based on Individual Thesaurus , 2001, NTCIR.

[7]  John Bear,et al.  Using Information Extraction to Improve Document Retrieval , 1998, TREC.

[8]  Key-Sun Choi,et al.  Re-ranking model based on document clusters , 2001, Inf. Process. Manag..

[9]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Czeslaw Danilowicz,et al.  Re-ranking method based on inter-document distances , 2005, Inf. Process. Manag..

[11]  Jaap Kamps,et al.  Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary , 2004, ECIR.

[12]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[13]  Deniz Erdoğmuş,et al.  Clustering using Renyi's entropy , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[14]  Key-Sun Choi,et al.  Document Re-ranking Model Using Clusters , 1999 .

[15]  Jian Yu,et al.  General C-Means Clustering Model , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Dong-Hong Ji,et al.  Information Retrieval Using Label Propagation Based Ranking , 2007, NTCIR.