A study of probability kinematics in information retrieval

We analyze the kinematics of probabilistic term weights at retrieval time for different Information Retrieval models. We present four models based on different notions of probabilistic retrieval. Two of these models are based on classical probability theory and can be considered as prototypes of models long in use in Information Retrieval, like the Vector Space Model and the Probabilistic Model. The two other models are based on a logical technique of evaluating the probability of a conditional called imaging; one is a generalization of the other. We analyze the transfer of probabilities occurring in the term space at retrieval time for these four models, compare their retrieval performance using classical test collections, and discuss the results. We believe that our results provide useful suggestions on how to improve existing probabilistic models of Information Retrieval by taking into consideration term-term similarity.

[1]  Brian Vickery Donald Urquhart, 1909-1994 , 1995, J. Documentation.

[2]  Donald Nute,et al.  Counterfactuals , 1975, Notre Dame J. Formal Log..

[3]  Jian-Yun Nie,et al.  Towards a probabilistic modal logic for semantic-based information retrieval , 1992, SIGIR '92.

[4]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[5]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[6]  Sabine Bergler,et al.  Incompleteness and Uncertainty in Information Systems , 1994, Workshops in Computing.

[7]  Yiyu Yao,et al.  On modeling information retrieval with probabilistic inference , 1995, TOIS.

[8]  William S. Cooper,et al.  A definition of relevance for information retrieval , 1971, Inf. Storage Retr..

[9]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[10]  P. Gärdenfors Imaging and Conditionalization , 1982 .

[11]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[12]  C. J. van Rijsbergen,et al.  Towards an information logic , 1989, SIGIR '89.

[13]  Aslib,et al.  The journal of documentation , 1945 .

[14]  K. Sparck Jones,et al.  INFORMATION RETRIEVAL TEST COLLECTIONS , 1976 .

[15]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[16]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[17]  S. Robertson The probability ranking principle in IR , 1997 .

[18]  S. Schwarz INFORMATION SERVICES TO INDUSTRY: THE ROLE OF THE TECHNOLOGICAL UNIVERSITY LIBRARY , 1976 .

[19]  Yiyu Yao,et al.  A probability distribution model for information retrieval , 1989, Inf. Process. Manag..

[20]  Ellen M. Voorhees,et al.  On Expanding Query Vectors with Lexically Related Words , 1993, TREC.

[21]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[22]  Robert Stalnaker Probability and Conditionals , 1970, Philosophy of Science.

[23]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[24]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[25]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[26]  Jian-Yun Nie,et al.  Information Retrieval as Counterfactual , 1995, Comput. J..

[27]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[28]  P G rdenfors,et al.  Knowledge in flux: modeling the dynamics of epistemic states , 1988 .

[29]  J. Davenport Editor , 1960 .

[30]  William S. Cooper The formalism of probability theory in IR: a foundation or an encumbrance? , 1994, SIGIR '94.

[31]  Padmini Srivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[32]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..

[33]  M. Sanderson,et al.  Sense resolution properties of logical imaging , 1995 .

[34]  Fabio Crestani,et al.  Information Retrieval by Logical Imaging , 1995, J. Documentation.

[35]  Yiyu Yao,et al.  Computation of term associations by a neural network , 1993, SIGIR.

[36]  J. Kellett London , 1914, The Hospital.

[37]  Padmini Srinivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[38]  Jian-Yun Nie,et al.  An information retrieval model based on modal logic , 1989, Inf. Process. Manag..

[39]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[40]  Norbert Fuhr,et al.  Probabilistic Datalog—a logic for powerful retrieval methods , 1995, SIGIR '95.

[41]  R. A. Bull,et al.  G. E. Hughes and M. J. Cresswell. An introduction to modal logic . Methuen and Co. Ltd., London1968, xii + 388 pp. , 1971 .

[42]  P. Gregory,et al.  February , 1890, The Hospital.

[43]  Fabio Crestani,et al.  The Troubles with Using a Logical Model of IR on a Large Collection of Documents , 1995, TREC.

[44]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[45]  C. J. van Rijsbergen,et al.  Information retrieval and situation theory , 1996, SIGF.

[46]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[47]  Mark Persoff UK , 1999, EC Tax Review.

[48]  Jean-Pierre Chevallet,et al.  About Retrieval Models and Logic , 1992, Comput. J..

[49]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .

[50]  J. Jenkins,et al.  Word association norms , 1964 .

[51]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[52]  P. D. Bruza,et al.  Stratified information disclosure: a synthesis between hypermedia and information retrieval , 1993 .

[53]  Yiyu Yao,et al.  A probabilistic inference model for information retrieval , 1991, Inf. Syst..