Non-binary independence model

In the binary Independence .adel [!uSa], the docu.ent vectors are binary vectors and a welsht, 1mo1m as the ten precision weight is aaalped to each query tera. It Is show in [!uSa] that Ita retrieval pertoraance will be better than that. or aaaiplng equal weights to all query t.eras. By taktns the logarltha or the tera precision wel&ht, 1t la shown in [RoSp] that retrieval Is optlaal under the binary independence 80del. When a tera occurs aore than once in a docuaent, there Ia a loaa or inroration 1r Ita ·presence but not Ita actual nuaber or occurrence is recorded (as In the binary Independence .adel). Thus, It Is benetlclal to IlSke use or the additional frequency intonation to obtain better retrieval perror81Dce than the binary Independence aodel. Previous atte.pta to 118ke use or the frequency intor88tlon gave little or no uproveaent [RaSl,Lose], when the 2-Polaaon .adel [Bo&r ,Bart 1 Is applied. However, an ad hoc aethod In [RaSY] using a technique in [Crorl yields aoae laproveaent. In this paper, the distributions or the teras in the set or relevant docuaents and their distributions In the set or irrelevant docuaents are slven. Based on the stven distributions, ve1Shta are aaaiped to teras In the docuaents (unlike the situation in which we1Shts are aaslsned to query teras in the binary independence aodel). It !a shown that the aaalpent yields. optlal retrieval. Bzperlaental results uains two actual collections ahov that there Is a a1p1t1cant laproveaent in retrieval pertoraance over the binary independence aodel. PerMiaaion to copy without ~ee • 11 or part o~ this material la granted provided that the copyright notice of the "Droanlzatlon o~ the 1984-ACM Con~erence on Reaearch and Development in Information Retrieval" and the title o~ the publication and its date appear.