论文信息 - The Probabilistic Relevance Framework: BM25 and Beyond - 字舞流文

相关论文

The Probabilistic Relevance Framework: BM25 and Beyond

Hugo Zaragoza

|

Stephen Robertson

|

S. Robertson

|

H. Zaragoza

Abstract:The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

参考文献

[1] Feller William,et al. An Introduction To Probability Theory And Its Applications , 1950 .

[2] M. E. Maron,et al. On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[3] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1967 .

[4] Stephen P. Harter,et al. A probabilistic approach to automatic keyword indexing , 1974 .

[5] Stephen P. Harter,et al. A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[6] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[7] W. Bruce Croft,et al. Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[8] Stephen E. Robertson,et al. Probabilistic models of indexing and searching , 1980, SIGIR '80.

[9] Stephen E. Robertson,et al. The Unified Probabilistic Model for IR , 1982, SIGIR.

[10] Stephen E. Robertson,et al. On Term Selection for Query Expansion , 1991, J. Documentation.

[11] Michael D. Gordon,et al. A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..

[12] Norbert Fuhr,et al. Probabilistic Models in Information Retrieval , 1992, Comput. J..

[13] Stephen E. Robertson,et al. Okapi at TREC , 1992, TREC.

[14] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[15] William S. Cooper,et al. Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.

[16] Stephen E. Robertson,et al. Okapi at TREC-5 , 1996, TREC.

[17] Ellen M. Voorhees,et al. The fifth text REtrieval conference (TREC-5) , 1997 .

[18] Stephen E. Robertson,et al. On relevance weights with little relevance information , 1997, SIGIR '97.

[19] Charles M. Grinstead,et al. Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[20] Fabio Crestani,et al. “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[21] Donna K. Harman,et al. Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[22] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[23] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[24] C. J. van Rijsbergen,et al. Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[25] ChengXiang Zhai,et al. Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[26] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27] Stephen E. Robertson,et al. Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[28] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[29] Stephen E. Robertson,et al. A new unified probabilistic model , 2004, J. Assoc. Inf. Sci. Technol..

[30] Hugues Bersini,et al. Constrained, non-linear, derivative-free, parallel optimization of continuous, high computing load, noisy objective functions , 2004 .

[31] Stephen E. Robertson,et al. Threshold Setting and Performance Optimization in Adaptive Filtering , 2002, Information Retrieval.

[32] Djoerd Hiemstra,et al. Parsimonious language models for information retrieval , 2004, SIGIR '04.

[33] Sebastiano Vigna,et al. MG4J at TREC 2005 , 2005, TREC.

[34] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[35] Stephen E. Robertson,et al. Microsoft Cambridge at TREC 14: Enterprise Track , 2005, TREC.

[36] Stephen E. Robertson,et al. Relevance weighting for query independent evidence , 2005, SIGIR '05.

[37] Volker Tresp. Proceedings of the NIPS 2005 Workshop on Learning to Rank , 2005 .

[38] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[39] Djoerd Hiemstra,et al. PFTijah: text search in an XML database system , 2006 .

[40] Stephen E. Robertson,et al. Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[41] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[42] Donald Metzler,et al. Automatic feature selection in the markov random field model for information retrieval , 2007, CIKM '07.

[43] Tao Tao,et al. An exploration of proximity measures in information retrieval , 2007, SIGIR.

[44] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[45] Tie-Yan Liu,et al. Learning to rank for information retrieval (LR4IR 2007) , 2007, SIGF.

[46] Stephen E. Robertson,et al. On rank-based effectiveness measures and optimization , 2007, Information Retrieval.

[47] Hugo Zaragoza,et al. Exploiting Morphological Query Structure Using Genetic Optimisation , 2008, NLDB.

[48] L. Dworsky. An Introduction to Probability , 2008 .

[49] Hugo Zaragoza,et al. UCM-Y!R at CLEF 2008 Robust and WSD tasks , 2008, CLEF.

[50] Yong Yu,et al. Viewing Term Proximity from a Different Perspective , 2008, ECIR.

[51] Tie-Yan Liu,et al. Learning to rank for information retrieval (LR4IR 2008) , 2008, SIGF.

[52] W. Bruce Croft,et al. Search Engines - Information Retrieval in Practice , 2009 .

[53] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.

引用

Ранжирование документов в системе поиска, основанной на применении онтологии (Document Ranking in Ontology-Based Information Retrieval System)

The KLE's Subtopic Mining System for the NTCIR-11 IMine Task

ICTNET at TREC 2017 Common Core Track

An Exponentiation Method for XML Element Retrieval

TheScientificWorldJournal

News Article Retrieval in Context for Event-centric Narrative Creation

Diagnosing BERT with Retrieval Heuristics

Integrating extra knowledge into word embedding models for biomedical NLP tasks

2017 International Joint Conference on Neural Networks (IJCNN)

The impact of fielding on retrieval performance and bias

Identifying Motifs in Folktales using Topic Models

Click Through Rate Prediction for Local Search Results

A Social Search Model for Large Scale Social Networks

A Neural Approach to Cross-Lingual Information Retrieval

ServiceGroup: A Human-Machine Cooperation Solution for Group Chat Customer Service

Biomedical Evidence Generation Engine.

DSDD: Domain-Specific Dataset Discovery on the Web

Mining and classifying customer reviews: a survey

Artificial Intelligence Review

Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval

Journal of Medical Systems

An unsupervised service annotation by review analysis

Int. J. Big Data Intell.

An Effective Term-Ranking Function for Query Expansion Based on Information Foraging Assessment

Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model