Probabilistic models for document retrieval: a comparison of perfromance on exterimental and synthetic data bases

Probabilistic document retrieval systems consistent with the two Poisson independence model outperforms the binary independence model if the terms are distributed as described by the model's assumptions. The Two Poisson Effectiveness Hypothesis suggests that retrieval models based upon the two Poisson model will outperform binary independent models when used on a “real-world” database, where independence and two Poisson term occurrence distributions fail to hold, because the added information obtained from incorporating term frequency information will more than compensate for the non-Poisson distributions of terms. Searches of the MED1033 database suggest that if terms are not independent and frequencies of term occurrence are not distributed in a two Poisson manner, the binary independence sequential retrieval model outperforms the two Poisson independence retrieval model.