Document Retrieval in Consideration of the Amount of Term Frequencies

We propose a document retrieval that evaluates the degree of similarity between a query and a document in consideration of not only term-weights but also the amount of term frequencies. Different from tf-idf term-weighting schemes, the proposed scheme never reflects a term frequency in calculating the term-weight. We carried out an experiment in retrieval performance evaluation using a subset of NTCIR-1. It turned out that appropriate parameters of calculating the similarity are depend on the number of query terms and that the proposed scheme is superior to well-known tf-idf schemes in retrieval performance.