Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification

Through BM25, the asymptotic term frequency quantification TF = tf/(tf+K ), where ${\textmd{tf}}$ is the within-document term frequency and K is a normalisation factor, became popular. This paper reports a finding regarding the meaning of the TF quantification: in the triangle of independence and subsumption, the TF quantification forms the altitude, that is, the middle between independent and subsumed events. We refer to this new assumption as semi-subsumed. While this finding of a well-defined probabilistic assumption solves the probabilistic interpretation of the BM25 TF quantification, it is also of wider impact regarding probability theory.