A simple probabilistic model for the relevance assessment of documents

Abstract When assessing the relevance of documents, different jurors usually do not completely agree. A simple model is set up to take this fact into account by assuming that the relevance assigned by the juror is a random variable. It leads to some interesting conclusions: The worst possible method to assess the relevance is a mere bisection into relevant and irrelevant. Even an ideal system cannot consistently find all relevant documents and only those, which is empirically well known. The retrieval system should also assign a measure of relevance rather than divide the set of all documents only into those found and those not found; in particular, Boolean operations should be supplemented by a ranking algorithm.