The Probabilistic Relevance Framework: BM25 and Beyond
Abstract:The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.
暂无分享,去 创建一个
[1] Feller William,et al. An Introduction To Probability Theory And Its Applications , 1950 .
[2] M. E. Maron,et al. On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.
[3] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1967 .
[4] Stephen P. Harter,et al. A probabilistic approach to automatic keyword indexing , 1974 .
[5] Stephen P. Harter,et al. A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..
[6] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..
[7] W. Bruce Croft,et al. Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.
[8] Stephen E. Robertson,et al. Probabilistic models of indexing and searching , 1980, SIGIR '80.
[9] Stephen E. Robertson,et al. The Unified Probabilistic Model for IR , 1982, SIGIR.
[10] Stephen E. Robertson,et al. On Term Selection for Query Expansion , 1991, J. Documentation.
[11] Michael D. Gordon,et al. A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..
[12] Norbert Fuhr,et al. Probabilistic Models in Information Retrieval , 1992, Comput. J..
[13] Stephen E. Robertson,et al. Okapi at TREC , 1992, TREC.
[14] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.
[15] William S. Cooper,et al. Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.
[16] Stephen E. Robertson,et al. Okapi at TREC-5 , 1996, TREC.
[17] Ellen M. Voorhees,et al. The fifth text REtrieval conference (TREC-5) , 1997 .
[18] Stephen E. Robertson,et al. On relevance weights with little relevance information , 1997, SIGIR '97.
[19] Charles M. Grinstead,et al. Introduction to probability , 1999, Statistics for the Behavioural Sciences.
[20] Fabio Crestani,et al. “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.
[21] Donna K. Harman,et al. Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.
[22] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..
[23] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..
[24] C. J. van Rijsbergen,et al. Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.
[25] ChengXiang Zhai,et al. Probabilistic Relevance Models Based on Document and Query Generation , 2003 .
[26] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[27] Stephen E. Robertson,et al. Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.
[28] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.
[29] Stephen E. Robertson,et al. A new unified probabilistic model , 2004, J. Assoc. Inf. Sci. Technol..
[30] Hugues Bersini,et al. Constrained, non-linear, derivative-free, parallel optimization of continuous, high computing load, noisy objective functions , 2004 .
[31] Stephen E. Robertson,et al. Threshold Setting and Performance Optimization in Adaptive Filtering , 2002, Information Retrieval.
[32] Djoerd Hiemstra,et al. Parsimonious language models for information retrieval , 2004, SIGIR '04.
[33] Sebastiano Vigna,et al. MG4J at TREC 2005 , 2005, TREC.
[34] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.
[35] Stephen E. Robertson,et al. Microsoft Cambridge at TREC 14: Enterprise Track , 2005, TREC.
[36] Stephen E. Robertson,et al. Relevance weighting for query independent evidence , 2005, SIGIR '05.
[37] Volker Tresp. Proceedings of the NIPS 2005 Workshop on Learning to Rank , 2005 .
[38] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.
[39] Djoerd Hiemstra,et al. PFTijah: text search in an XML database system , 2006 .
[40] Stephen E. Robertson,et al. Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.
[41] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[42] Donald Metzler,et al. Automatic feature selection in the markov random field model for information retrieval , 2007, CIKM '07.
[43] Tao Tao,et al. An exploration of proximity measures in information retrieval , 2007, SIGIR.
[44] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[45] Tie-Yan Liu,et al. Learning to rank for information retrieval (LR4IR 2007) , 2007, SIGF.
[46] Stephen E. Robertson,et al. On rank-based effectiveness measures and optimization , 2007, Information Retrieval.
[47] Hugo Zaragoza,et al. Exploiting Morphological Query Structure Using Genetic Optimisation , 2008, NLDB.
[48] L. Dworsky. An Introduction to Probability , 2008 .
[49] Hugo Zaragoza,et al. UCM-Y!R at CLEF 2008 Robust and WSD tasks , 2008, CLEF.
[50] Yong Yu,et al. Viewing Term Proximity from a Different Perspective , 2008, ECIR.
[51] Tie-Yan Liu,et al. Learning to rank for information retrieval (LR4IR 2008) , 2008, SIGF.
[52] W. Bruce Croft,et al. Search Engines - Information Retrieval in Practice , 2009 .
[53] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.