Pseudo relevance feedback using semantic clustering in relevance language model
暂无分享,去创建一个
Pseudo relevance feedback has demonstrated to be in general an effective technique for improving retrieval effectiveness, but the noise in the top retrieved documents still can cause topic drift problem that affects the performance of certain topics. By viewing a document as an interaction of a set of independent hidden topics, we propose a novel semantic clustering technique using independent component analysis. Then within the language modeling framework, we apply the obtained semantic topic clusters into the query sampling process so that the sampling depends on the activated topics rather than on the individual document language model. Therefore, we obtain a semantic cluster based relevance language model, which uses pseudo relevance feedback technique without requiring any relevance training information. We applied the model on five TREC data sets. The experiments show that our model can significantly improve retrieval performance over traditional language models including relevance-based and clustering-based retrieval language models. The main contribution of the improvements comes from the estimation of the relevance model on the semantic clusters that are closely related to the query.
[1] Aapo Hyvärinen,et al. Survey on Independent Component Analysis , 1999 .
[2] James Allan,et al. A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.
[3] John D. Lafferty,et al. Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.
[4] W. Bruce Croft,et al. Cluster-based retrieval using language models , 2004, SIGIR '04.