Multinomial Randomness Models for Retrieval with Document Fields

Document fields, such as the title or the headings of a document, offer a way to consider the structure of documents for retrieval. Most of the proposed approaches in the literature employ either a linear combination of scores assigned to different fields, or a linear combination of frequencies in the term frequency normalisation component. In the context of the Divergence From Randomness framework, we have a sound opportunity to integrate document fields in the probabilistic randomness model. This paper introduces novel probabilistic models for incorporating fields in the retrieval process using a multinomial randomness model and its information theoretic approximation. The evaluation results from experiments conducted with a standard TREC Web test collection show that the proposed models perform as well as a state-of-the-art field-based weighting model, while at the same time, they are theoretically founded and more extensible than current field-based models.

[1]  Rong Jin,et al.  Title language model for information retrieval , 2002, SIGIR '02.

[2]  Craig MacDonald,et al.  University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming , 2005, CLEF.

[3]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[4]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[5]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.

[6]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[7]  Mounia Lalmas,et al.  SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , 2006 .

[8]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[9]  Deniz Yuret From Genetic Algorithms to Efficient Optimization , 1994 .

[10]  Vasileios Plachouras,et al.  Selective web information retrieval , 2006 .

[11]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[12]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[13]  Iadh Ounis,et al.  University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[14]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[15]  Ian Soboroff On evaluating web search with very few relevant documents , 2004, SIGIR '04.

[16]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[17]  W. Press,et al.  Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[18]  Craig MacDonald,et al.  Combining fields in known-item email search , 2006, SIGIR '06.

[19]  David Hawking,et al.  Toward better weighting of anchors , 2004, SIGIR '04.