论文信息 - Document Retrieval in Consideration of the Amount of Term Frequencies

Document Retrieval in Consideration of the Amount of Term Frequencies

We propose a document retrieval that evaluates the degree of similarity between a query and a document in consideration of not only term-weights but also the amount of term frequencies. Different from tf-idf term-weighting schemes, the proposed scheme never reflects a term frequency in calculating the term-weight. We carried out an experiment in retrieval performance evaluation using a subset of NTCIR-1. It turned out that appropriate parameters of calculating the similarity are depend on the number of query terms and that the proposed scheme is superior to well-known tf-idf schemes in retrieval performance.

Yoshihiro Ueda | Hiroshi Umemoto | Tadanobu Miyauchi

[1] Hiroshi Umemoto,et al. Development of a Related Document Retrieval System and Evaluation of the System Using NTCIR-1 , 1999, NTCIR.

[2] Hiroshi Masuichi,et al. The Japanese lexical transducer based on stem-suffix style forms , 1996, Nat. Lang. Eng..

[3] Jun'ichi Tsujii,et al. A Method of Measuring Term Representativeness - Baseline Method Using Co-occurrence Distribution , 2000, COLING.