On modeling information retrieval with probabilistic inference

This article examines and extends the logical models of information retrieval in the context of probability theory. The fundamental notions of term weights and relevance are given probabilistic interpretations. A unified framework is developed for modeling the retrieval process with probabilistic inference. This new approach provides a common conceptual and mathematical basis for many retrieval models, such as the Boolean, fuzzy set, vector space, and conventional probabilistic models. Within this framework, the underlying assumptions employed by each model are identified, and the inherent relationships between these models are analyzed. Although this article is mainly a theoretical analysis of probabilistic inference for information retrieval, practical methods for estimating the required probabilities are provided by simple examples.

[1]  Glenn Shafer,et al.  Probability Judgment in Artificial Intelligence and Expert Systems , 1987 .

[2]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[3]  Peter Schäuble,et al.  Thesaurus Based Concept Spaces , 1987, SIGIR.

[4]  P. Bollmann,et al.  INFORMATION RETRIEVAL BASED ON AXIOMATIC DECISION THEORY , 1991 .

[5]  Rudolf Carnap,et al.  Logical foundations of probability , 1951 .

[6]  L. M. M.-T. Theory of Probability , 1929, Nature.

[7]  S. K. Michael Wong,et al.  Rough Sets: Probabilistic versus Deterministic Approach , 1988, Int. J. Man Mach. Stud..

[8]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986 .

[9]  Jian-Yun Nie,et al.  Towards a probabilistic modal logic for semantic-based information retrieval , 1992, SIGIR '92.

[10]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[11]  Michael A. Harrison,et al.  Introduction to switching and automata theory , 1965 .

[12]  Michael D. Gordon,et al.  Recall-precision trade-off: A derivation , 1989, JASIS.

[13]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[14]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[15]  Yiyu Yao,et al.  A probabilistic inference model for information retrieval , 1991, Inf. Syst..

[16]  Yiyu Yao,et al.  A Decision Theoretic Framework for Approximating Concepts , 1992, Int. J. Man Mach. Stud..

[17]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[18]  Jian-Yun Nie,et al.  An information retrieval model based on modal logic , 1989, Inf. Process. Manag..

[19]  P. Schauble,et al.  Thesaurus based concept spaces , 1987, SIGIR '87.

[20]  B. M. Hill,et al.  Theory of Probability , 1990 .

[21]  Edward A. Fox,et al.  Research Contributions , 2014 .

[22]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986, J. Am. Soc. Inf. Sci..

[23]  Vijay V. Raghavan,et al.  On modeling of information retrieval concepts in vector spaces , 1987, TODS.

[24]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[25]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[26]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[27]  Abraham Bookstein,et al.  Implications of Boolean structure for probabilistic retrieval , 1985, SIGIR '85.

[28]  W. Bruce Croft,et al.  A Comparison of Text Retrieval Models , 1992, Comput. J..

[29]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[30]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[31]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[32]  Donald H. Kraft,et al.  Threshold values and Boolean retrieval systems , 1981, Inf. Process. Manag..

[33]  Paul Thompson,et al.  Subjective Probability and Information Retrieval: a Review of the Psychological literature , 1988, J. Documentation.

[34]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[35]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[36]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[37]  C. J. van Rijsbergen,et al.  Towards an information logic , 1989, SIGIR '89.

[38]  William S. Cooper,et al.  A definition of relevance for information retrieval , 1971, Inf. Storage Retr..

[39]  Yiyu Yao,et al.  A generalized binary probabilistic independence model , 1990, J. Am. Soc. Inf. Sci..

[40]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 2: Mathematical treatment of CEO model 3 , 1990, Inf. Process. Manag..

[41]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[42]  R. Giles Łukasiewicz logic and fuzzy set theory , 1976 .

[43]  S. Robertson The probability ranking principle in IR , 1997 .

[44]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model , 1990, Inf. Process. Manag..

[45]  George J. Klir,et al.  Fuzzy sets, uncertainty and information , 1988 .

[46]  Norbert Fuhr,et al.  Two models of retrieval with probabilistic indexing , 1986, SIGIR '86.

[47]  I. Good Good Thinking: The Foundations of Probability and Its Applications , 1983 .

[48]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[49]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[50]  Tadeusz Radecki Mathematical model of information retrieval system based on the concept of Fuzzy thesaurus , 1976, Inf. Process. Manag..

[51]  Donald H. Kraft,et al.  A model for a weighted retrieval system , 1981, J. Am. Soc. Inf. Sci..

[52]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[53]  C. J. van Rijsbergen,et al.  Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval , 1987, SIGIR 1987.

[54]  C. J. van Rijsbergen,et al.  Probabilistic Retrieval Revisited , 1992, Comput. J..

[55]  T. Fine,et al.  The Emergence of Probability , 1976 .

[56]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[57]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[58]  W. Bruce Croft,et al.  Retrieving documents by plausible inference: An experimental study , 1989, Inf. Process. Manag..

[59]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[60]  Jean-Pierre Chevallet,et al.  About Retrieval Models and Logic , 1992, Comput. J..