论文信息 - Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants

Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants

When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity “matter”? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene’s often maligned approximation of document length. As an added benefit, our empirical approach takes advantage of databases for rapid IR prototyping, which validates both the feasibility and methodological advantages claimed in previous work.

[1] ChengXiang Zhai,et al. When documents are very long, BM25 fails! , 2011, SIGIR.

[2] Jimmy J. Lin,et al. Old dogs are great at new tricks: column stores for ir prototyping , 2014, SIGIR.

[3] Jimmy J. Lin,et al. Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019) , 2019, OSIRRC@SIGIR.

[4] Michalis Vazirgiannis,et al. Composition of TF normalizations: new insights on scoring functions for ad hoc IR , 2013, SIGIR.

[5] Jimmy J. Lin,et al. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures , 2013, SIGIR.

[6] Jimmy J. Lin,et al. The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019) , 2019, SIGIR.

[7] ChengXiang Zhai,et al. Adaptive term frequency normalization for BM25 , 2011, CIKM '11.

[8] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[9] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[10] Andrew Trotman,et al. Improvements to BM25 and Language Models Examined , 2014, ADCS.

[11] ChengXiang Zhai,et al. Lower-bounding term frequency normalization , 2011, CIKM '11.

[12] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[13] Andrew Trotman,et al. Towards an Efficient and Effective Search Engine , 2012, OSIR@SIGIR.

[14] ZaragozaHugo,et al. The Probabilistic Relevance Framework , 2009 .