Evaluation of the 2-Poisson model as a basis for using term frequency data in searching

The early work on the probabilistic models of retrieval assumed that the document representation is binary, indicating only the presence or absence of index terms. The 2-Poisson (TP) model which was proposed as a model of how the occurrence frequency of specialty words in a collection is distributed, has since been used to develop retrieval strategies that incorporate term frequency information. This work investigates the use of the TP model, in this context, further. It is shown that the search effectiveness, when no relevance information is assumed, can be further enhanced by using this model. Furthermore, when the term weights proposed in this work are used in conjunction with weights known as term significance weights, the results are very encouraging.