论文信息 - Entropy-based clustering for improving document re-ranking

Entropy-based clustering for improving document re-ranking

Document re-ranking locates between initial retrieval and query expansion in information retrieval system. In this paper, we propose entropy-based clustering approach for document re-ranking. The value of within-cluster entropy determines whether two classes should be merged, and the value of between-cluster entropy determines how many clusters are reasonable. What to do next is finding a suitable cluster from clustering result to construct pseudo labeled document, and conduct document re-ranking as our previous method. We focus clustering strategy for documents after initial retrieval. Experiment with NTCIR-5 data show that the approach can improve the performance of initial retrieval, and it is helpful for improving the quality of document re-ranking.

[1] Tao Tao,et al. A Mixture Clustering Model for Pseudo Feedback in Information Retrieval , 2004 .

[2] Noriko Kando,et al. Overview of the NTCIR-7 ACLIA IR4QA Task , 2008, NTCIR.

[3] Yang Lingpeng,et al. Information Retrieval Using Label Propagation Based Ranking , 2007 .

[4] Donghong Ji,et al. A Study on Pseudo Labeled Document Constructed for Document Re-ranking , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[5] Gerard Salton,et al. Automatic Information Organization And Retrieval , 1968 .

[6] Jun Wang,et al. Rerank Method Based on Individual Thesaurus , 2001, NTCIR.

[7] John Bear,et al. Using Information Extraction to Improve Document Retrieval , 1998, TREC.

[8] Key-Sun Choi,et al. Re-ranking model based on document clusters , 2001, Inf. Process. Manag..

[9] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Czeslaw Danilowicz,et al. Re-ranking method based on inter-document distances , 2005, Inf. Process. Manag..

[11] Jaap Kamps,et al. Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary , 2004, ECIR.

[12] W. Bruce Croft,et al. Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[13] Deniz Erdoğmuş,et al. Clustering using Renyi's entropy , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[14] Key-Sun Choi,et al. Document Re-ranking Model Using Clusters , 1999 .

[15] Jian Yu,et al. General C-Means Clustering Model , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[16] Dong-Hong Ji,et al. Information Retrieval Using Label Propagation Based Ranking , 2007, NTCIR.