Theories of information and uncertainty for the modelling of information retrieval : an application of situation theory and Dempster-Schafer's theory of evidence

Current information retrieval models only offer simplistic and specific representations of information. Therefore, there is a need for the development of a new formalism able to model information retrieval systems in a more generic manner. In 1986, Van Rijsbergen suggested that such formalisms can be both appropriately and powerfully defined within a logic. The resulting formalism should capture information as it appears in an information retrieval system, and also in any of its inherent forms. The aim of this thesis is to understand the nature of information in information retrieval, and to propose a logic-based model of an information retrieval system that reflects this nature. The first objective of this thesis is to identify essential features of information in an information retrieval system. These are: 0 flow, 0 intensionality, 0 partiality, 0 structure, 0 significance, and o uncertainty. It is shown that the first four features are qualitative, whereas the last two are quantitative, and that their modelling requires different frameworks: a theory of information, and a theory of uncertainty, respectively. The second objective of this thesis is to determine the appropriate framework for each type of feature, and to develop a method to combine them in a consistent fashion. The combination is based on the Transformation Principle. Many specific attempts have been made to derive an adequate definition of information. The one adopted in this thesis is based on that of Dretske, Barwise, and Devlin who claimed that there is a primitive notion of information in terms of which a logic can be defined, and subsequently developed a theory of information, namely Situation Theory. Their approach was in accordance with Van Rijsbergen' s suggestion of a logic-based formalism for modelling an information retrieval system. This thesis shows that Situation Theory is best at representing all the qualitative features. Regarding the modelling of the quantitative features of information, this thesis shows that the framework that models them best is the Dempster-Shafer Theory of Evidence, together with the notion of refinement, later introduced by Shafer. The third objective of this thesis is to develop a model of an information retrieval system based on Situation Theory and the Dempster-Shafer Theory of Evidence. This is done in two steps. First, the unstructured model is defined in which the structure and the significance of information are not accounted for. Second, the unstructured model is extended into the structured model, which incorporates the structure and the significance of information. This strategy is adopted because it enables the careful representation of the flow of information to be performed first. The final objective of the thesis is to implement the model and to perform empirical evaluation to assess its validity. The unstructured and the structured models are implemented based on an existing on-line thesaurus, known as WordNet. The experiments performed to evaluate the two models use the National Physical Laboratory standard test collection. The experimental performance obtained was poor, because it was difficult to extract the flow of information from the document set. This was mainly due to the data used in the experimentation which was inappropriate for the test collection. However, this thesis shows that if more appropriate data, for example, indexing tools and thesauri, were available, better performances would be obtained. The conclusion of this work was that Situation Theory, combined with the Dempster-Shafer Theory of Evidence, allows the appropriate and powerful representation of several essential features of information in an information retrieval system. Although its implementation presents some difficulties, the model is the first of its kind to capture, in a general manner, these features within a uniform framework. As a result, it can be easily generalized to many types of information retrieval systems (e.g., interactive, multimedia systems), or many aspects of the retrieval process (e.g., user modelling).

[1]  Jian-Yun Nie,et al.  Towards a probabilistic modal logic for semantic-based information retrieval , 1992, SIGIR '92.

[2]  Max J. Cresswell,et al.  A companion to modal logic , 1984 .

[3]  Yiyu Yao,et al.  A probability distribution model for information retrieval , 1989, Inf. Process. Manag..

[4]  Fabrizio Sebastiani,et al.  A probabilistic terminological logic for modelling information retrieval , 1994, SIGIR '94.

[5]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[6]  Fabrizio Sebastiani,et al.  A Note on Logic and Information Retrieval , 1995, MIRO.

[7]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[8]  Steven Reece,et al.  Modelling information retrieval agents with belief revision , 1994, SIGIR '94.

[9]  Tengku Mohd Tengku Sembok Logical-linguistic model and experiments in document retrieval , 1989 .

[10]  Mary Elizabeth Stevens,et al.  Statistical Association Methods for Mechanized Documentation. , 1967 .

[11]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[12]  Alessandro Saffiotti,et al.  An AI view of the treatment of uncertainty , 1987, The Knowledge Engineering Review.

[13]  Theo Huibers,et al.  Een theorie voor het bestuderen van information retrieval modellen (in Dutch) , 1994 .

[14]  M. Kracker A fuzzy concept network model and its applications , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.

[15]  Peter F. Patel-Schneider,et al.  A Four-Valued Semantics for Frame-Based Description Languages , 1986, AAAI.

[16]  Robert N. Oddy,et al.  Towards the Use of Situational Information in Information Retrieval , 1992, J. Documentation.

[17]  Jacques Savoy,et al.  Bayesian Inference Networks and Spreading Activation in Hypertext Systems , 1992, Inf. Process. Manag..

[18]  Anthony Hunter Using Default Logic in Information Retrieval , 1995, ECSQARU.

[19]  Alan F. Smeaton,et al.  Progress in the Application of Natural Language Processing to Information Retrieval Tasks , 1992, Comput. J..

[20]  C. J. van Rijsbergen,et al.  Towards an information logic , 1989, SIGIR '89.

[21]  J. C. Wortmann Logics for artificial intelligence: Ellis Horwood Series in Artificial Intelligence, Ellis Horwood, Chichester, 1984, + 121 pages, £16.50 , 1987 .

[22]  J. Paris The Uncertain Reasoner's Companion: A Mathematical Perspective , 1994 .

[23]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[24]  Zygmunt Mazur Models of a Distributed Information Retrieval System Based on Thesauri with Weights , 1994, Inf. Process. Manag..

[25]  Mounia Lalmas,et al.  The use of logic in information retrieval modelling , 1998, The Knowledge Engineering Review.

[26]  T. Fine,et al.  Bayes-like Decision Making With Upper and Lower Probabilities , 1982 .

[27]  Hans-Jürgen Zimmermann,et al.  Fuzzy Set Theory - and Its Applications , 1985 .

[28]  Umberto Straccia,et al.  A model of information retrieval based on a terminological logic , 1993, SIGIR.

[29]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[30]  Michael Morreau,et al.  Epistemic semantics for counterfactuals , 1992, J. Philos. Log..

[31]  Adrian Müller,et al.  Using Abductive Inference and Dynamic Indexing to Retrieve Multimedia SGML Documents , 1995, MIRO.

[32]  Jerry Seligman Perspectives: a relativistic approach to the theory of information , 1991 .

[33]  C. J. van Rijsbergen,et al.  Probabilistic Retrieval Revisited , 1992, Comput. J..

[34]  Mounia Lalmas,et al.  A Logical Model of Information Retrieval Based on Situation Theory , 1993 .

[35]  Alex Goodall,et al.  The guide to expert systems , 1985 .

[36]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[37]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[38]  Raymond Reiter,et al.  A Logic for Default Reasoning , 1987, Artif. Intell..

[39]  Robert C. Moore Reasoning About Knowledge and Action , 1977, IJCAI.

[40]  Terry Winograd,et al.  Language as a Cognitive Process , 1983, CL.

[41]  Witold Lukaszewicz,et al.  Non-monotonic reasoning - formalization of commonsense reasoning , 1990 .

[42]  Johan van Benthem,et al.  The Logic of Time , 1983 .

[43]  C. J. van Rijsbergen,et al.  Situation Theory and Dempster-Shafer's Theory of Evidence for Information Retrieval , 1993, SOFTEKS Workshop on Incompleteness and Uncertainty in Information Systems.

[44]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[45]  Yiyu Yao,et al.  A probabilistic inference model for information retrieval , 1991, Inf. Syst..

[46]  T M V Janssen Foundations and applications of Montague grammar Part 2: Applications to natural language , 1986 .

[47]  H. E. Jowsey Constraining Montague Grammar for computational applications , 1991 .

[48]  G. E. Hughes,et al.  An introduction to modal logic, 2e éd., 1 vol , 1973 .

[49]  Paul Krause,et al.  Representing Uncertain Knowledge , 1993, Springer Netherlands.

[50]  B. Partee,et al.  Mathematical Methods in Linguistics , 1990 .

[51]  G. Shafer Jeffrey's Rule of Conditioning , 1981, Philosophy of Science.

[52]  Peter Ingwersen,et al.  Polyrepresentation of information needs and semantic entities: elements of a cognitive theory for information retrieval interaction , 1994, SIGIR '94.

[53]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[54]  C. J. van Rijsbergen,et al.  Information retrieval and situation theory , 1996, SIGF.

[55]  Jian-Yun Nie,et al.  Information Retrieval as Counterfactual , 1995, Comput. J..

[56]  J.F.A.K. van Benthem,et al.  Modal Logic as a Theory of Information , 1996 .

[57]  Sarit Kraus,et al.  Nonmonotonic Reasoning, Preferential Models and Cumulative Logics , 1990, Artif. Intell..

[58]  Iadh Ounis,et al.  Axiomatization of a Conceptual Graph Formalism for Information Retrieval in a Situated Framework , 1995 .

[59]  S. Robertson The probability ranking principle in IR , 1997 .

[60]  Elizabeth Du,et al.  The discourse-level structure of empirical abstracts: an exploratory study , 1991, Inf. Process. Manag..

[61]  C. J. van Rijsbergen,et al.  A New Theoretical Framework for Information Retrieval , 1986, SIGIR Forum.

[62]  Chris D. Paice,et al.  A thesaural model of information retrieval , 1991, Inf. Process. Manag..

[63]  Dana S. Scott,et al.  Some Domain Theory and Denotational Semantics in Coq , 2009, TPHOLs.

[64]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[65]  S. Lane Categories for the Working Mathematician , 1971 .

[66]  Carlo Meghini An image retrieval model based on classical logic , 1995, SIGIR '95.

[67]  Umberto Straccia Document Retrieval by Relevance Terminological Logics , 1995, MIRO.

[68]  Jian-Yun Nie An outline of a general model for information retrieval systems , 1988, SIGIR '88.

[69]  Gerard Salton,et al.  On the application of syntactic methodologies in automatic text analysis , 1990, Inf. Process. Manag..

[70]  Gerard Salton,et al.  Automatic indexing , 1980, ACM '80.

[71]  Sadaaki Miyamoto,et al.  Generation of a pseudothesaurus for information retrieval based on cooccurrences and fuzzy set operations , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[72]  Fred Landman Towards a Theory of Information: The Status of Partial Objects in Semantics , 1986 .

[73]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[74]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[75]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[76]  Clement T. Yu,et al.  Automatic indexing using term discrimination and term precision measurements , 1976, Information Processing & Management.

[77]  E. Zalta,et al.  Intensional Logic and the Metaphysics of Intentionality. , 1991 .

[78]  Joan M. Morrissey,et al.  Imprecise information and uncertainty in information systems , 1990, TOIS.

[79]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[80]  A. L. Alty Formal methods in artificial intelligence: By Jean-Paul Delahaye Published by North Oxford Academic Press, June 1987 205 pps., Hardback, £19.50 ISBN: 0 946536 18X , 1990 .

[81]  D. Lewis Probabilities of Conditionals and Conditional Probabilities , 1976 .

[82]  L. Zadeh,et al.  Fuzzy sets and applications : selected papers , 1987 .

[83]  Jian-Yun Nie,et al.  An information retrieval model based on modal logic , 1989, Inf. Process. Manag..

[84]  C. J. van Rijsbergen,et al.  SILOL: A simple logical-linguistic document retrieval system , 1990, Inf. Process. Manag..

[85]  C. J. V. Rijsbergen,et al.  Information calculus for information retrieval , 1996 .