相关论文

The Probabilistic Relevance Framework: BM25 and Beyond

Abstract:The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

参考文献

[1]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[2]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[3]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[4]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing , 1974 .

[5]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[6]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[7]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[8]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[9]  Stephen E. Robertson,et al.  The Unified Probabilistic Model for IR , 1982, SIGIR.

[10]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[11]  Michael D. Gordon,et al.  A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..

[12]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[13]  Stephen E. Robertson,et al.  Okapi at TREC , 1992, TREC.

[14]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[15]  William S. Cooper,et al.  Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.

[16]  Stephen E. Robertson,et al.  Okapi at TREC-5 , 1996, TREC.

[17]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[18]  Stephen E. Robertson,et al.  On relevance weights with little relevance information , 1997, SIGIR '97.

[19]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[20]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[21]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[22]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[23]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[24]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[25]  ChengXiang Zhai,et al.  Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[28]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[29]  Stephen E. Robertson,et al.  A new unified probabilistic model , 2004, J. Assoc. Inf. Sci. Technol..

[30]  Hugues Bersini,et al.  Constrained, non-linear, derivative-free, parallel optimization of continuous, high computing load, noisy objective functions , 2004 .

[31]  Stephen E. Robertson,et al.  Threshold Setting and Performance Optimization in Adaptive Filtering , 2002, Information Retrieval.

[32]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[33]  Sebastiano Vigna,et al.  MG4J at TREC 2005 , 2005, TREC.

[34]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[35]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 14: Enterprise Track , 2005, TREC.

[36]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[37]  Volker Tresp Proceedings of the NIPS 2005 Workshop on Learning to Rank , 2005 .

[38]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[39]  Djoerd Hiemstra,et al.  PFTijah: text search in an XML database system , 2006 .

[40]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[41]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[42]  Donald Metzler,et al.  Automatic feature selection in the markov random field model for information retrieval , 2007, CIKM '07.

[43]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[44]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[45]  Tie-Yan Liu,et al.  Learning to rank for information retrieval (LR4IR 2007) , 2007, SIGF.

[46]  Stephen E. Robertson,et al.  On rank-based effectiveness measures and optimization , 2007, Information Retrieval.

[47]  Hugo Zaragoza,et al.  Exploiting Morphological Query Structure Using Genetic Optimisation , 2008, NLDB.

[48]  L. Dworsky An Introduction to Probability , 2008 .

[49]  Hugo Zaragoza,et al.  UCM-Y!R at CLEF 2008 Robust and WSD tasks , 2008, CLEF.

[50]  Yong Yu,et al.  Viewing Term Proximity from a Different Perspective , 2008, ECIR.

[51]  Tie-Yan Liu,et al.  Learning to rank for information retrieval (LR4IR 2008) , 2008, SIGF.

[52]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[53]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

引用
Ранжирование документов в системе поиска, основанной на применении онтологии (Document Ranking in Ontology-Based Information Retrieval System)
RCDL
2012
The KLE's Subtopic Mining System for the NTCIR-11 IMine Task
NTCIR
2014
ICTNET at TREC 2017 Common Core Track
TREC
2017
An Exponentiation Method for XML Element Retrieval
TheScientificWorldJournal
2014
News Article Retrieval in Context for Event-centric Narrative Creation
IN2WRITING
2021
Diagnosing BERT with Retrieval Heuristics
ECIR
2020
Integrating extra knowledge into word embedding models for biomedical NLP tasks
2017 International Joint Conference on Neural Networks (IJCNN)
2017
The impact of fielding on retrieval performance and bias
2018
Identifying Motifs in Folktales using Topic Models
2013
Click Through Rate Prediction for Local Search Results
WSDM
2017
A Social Search Model for Large Scale Social Networks
ArXiv
2020
A Neural Approach to Cross-Lingual Information Retrieval
2018
ServiceGroup: A Human-Machine Cooperation Solution for Group Chat Customer Service
SIGIR
2020
Biomedical Evidence Generation Engine.
2019
DSDD: Domain-Specific Dataset Discovery on the Web
CIKM
2021
Mining and classifying customer reviews: a survey
Artificial Intelligence Review
2021
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval
Journal of Medical Systems
2017
An unsupervised service annotation by review analysis
Int. J. Big Data Intell.
2018
An Effective Term-Ranking Function for Query Expansion Based on Information Foraging Assessment
MIKE
2014
Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model
2017