Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

The 2-Poisson model for term frequencies is used to suggest ways of incorporating certain variables in probabilistic models for information retrieval. The variables concerned are within-document term frequency, document length, and within-query term frequency. Simple weighting functions are developed, and tested on the TREC test collection. Considerable performance improvements (over simple inverse collection frequency weighting) are demonstrated.

[1]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing , 1974 .

[2]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[3]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[4]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[5]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[6]  William S. Cooper,et al.  Inconsistencies and Misnomers in Probabilistic IR. , 1991, SIGIR 1991.

[7]  William S. Cooper,et al.  Some inconsistencies and misnomers in probabilistic information retrieval , 1991, SIGIR '91.

[8]  Fredric C. Gey,et al.  Probabilistic Retrieval in the TIPSTER Collections: An Application of Staged Logistic Regression , 1992, TREC.

[9]  James Allan,et al.  Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[10]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[11]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[12]  Stephen E. Robertson,et al.  Okapi at TREC-2 , 1993, TREC.

[13]  Alistair Moffat,et al.  Retrieval of Partial Documents , 1993, TREC.

[14]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[15]  Eugene L. Margulis,et al.  Modelling Documents with Multiple Poisson Distributions , 1993, Inf. Process. Manag..

[16]  Stephen E. Robertson Documentation note Query-Document Symmetry and Dual Models , 1994, J. Documentation.

[17]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..