Improving the Relevance-based Superimposition model for IR with automatic keyword extraction

Our previous studies proved that our proposed Relevance-based Superimposition (RS) model for information retrieval (IR) is very effective for scientific paper archives where the expressional variation between documents written by different authors is high. The RS model makes the best use of the relevance of documents by modifying document feature vectors based on the relevance information. This paper presents a method for applying the RS model to general archives where well-chosen keywords are not given for documents. We investigated automatic feature term extraction, and found that the key issues for this improvement are cluster refinement and parameter optimization of the RS model. New experiments indicated that the extended method achieves better retrieval precision than is obtained using keywords given by the authors.