论文信息 - Multinomial Randomness Models for Retrieval with Document Fields - 字舞流文

Multinomial Randomness Models for Retrieval with Document Fields

Document fields, such as the title or the headings of a document, offer a way to consider the structure of documents for retrieval. Most of the proposed approaches in the literature employ either a linear combination of scores assigned to different fields, or a linear combination of frequencies in the term frequency normalisation component. In the context of the Divergence From Randomness framework, we have a sound opportunity to integrate document fields in the probabilistic randomness model. This paper introduces novel probabilistic models for incorporating fields in the retrieval process using a multinomial randomness model and its information theoretic approximation. The evaluation results from experiments conducted with a standard TREC Web test collection show that the proposed models perform as well as a state-of-the-art field-based weighting model, while at the same time, they are theoretically founded and more extensible than current field-based models.

Iadh Ounis | Vassilis Plachouras | I. Ounis | Vassilis Plachouras

[1] Rong Jin,et al. Title language model for information retrieval , 2002, SIGIR '02.

[2] Craig MacDonald,et al. University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming , 2005, CLEF.

[3] David Hawking,et al. Overview of the TREC 2003 Web Track , 2003, TREC.

[4] Stephen E. Robertson,et al. Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[5] David Hawking,et al. Overview of the TREC 2004 Web Track , 2004, TREC.

[6] Ben He,et al. Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[7] Mounia Lalmas,et al. SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , 2006 .

[8] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[9] Deniz Yuret. From Genetic Algorithms to Efficient Optimization , 1994 .

[10] Vasileios Plachouras,et al. Selective web information retrieval , 2006 .

[11] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[12] William H. Press,et al. The Art of Scientific Computing Second Edition , 1998 .

[13] Iadh Ounis,et al. University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[14] C. J. van Rijsbergen,et al. Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[15] Ian Soboroff. On evaluating web search with very few relevant documents , 2004, SIGIR '04.

[16] David Hawking,et al. Overview of the TREC-2002 Web Track , 2002, TREC.

[17] W. Press,et al. Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[18] Craig MacDonald,et al. Combining fields in known-item email search , 2006, SIGIR '06.

[19] David Hawking,et al. Toward better weighting of anchors , 2004, SIGIR '04.