TREC-7 Experiments: Query Expansion Method Based on Word Contribution

This is KDD R&D Laboratories' rst participation in TREC. In this participation, we focused on experiments on a novel method of query expansion. The query expansion method described in this paper is based on a measure we call "word contribution". Word contribution is a measure which expresses the in uence of a word to the similarity between the query and a document. From our data, we gured that words which have highly negative contribution can be considered as to being expressive of the characteristics of the data (query or document) in which they exist. We proposed extracting such words from documents highly similar to a query, and adding them to the original query to generate an expanded query. We made experiments to evaluate this method, and reported the results in this paper. We submitted 3 o cial ad hoc runs (KD70000, KD71010q, KD71010s) to TREC-7. However, the data we used for these runs were generated by a buggy morphological analysis program, which we consider a serious cause for our bad results. Since the o cial submission, we have xed these bugs, and reconstructured our data. The results described in this paper are based on these new data, and some experiments made after the TREC-7 conference.