An inferential approach to Information Retrieval and its implementation using a manual thesaurus

Most inferential approaches to Information Retrieval (IR) have been investigated within the probabilistic framework. Although these approaches allow one to cope with the underlying uncertainty of inference in IR, the strict formalism of probability theory often confines our use of knowledge to statistical knowledge alone (e.g. connections between terms based on their co-occurrences). Human-defined knowledge (e.g. manual thesauri) can only be incorporated with difficulty. In this paper, based on a general idea proposed by van Rijsbergen, we first develop an inferential approach within a fuzzy modal logic framework. Differing from previous approaches, the logical component is emphasized and considered as the pillar in our approach. In addition, the flexibility of a fuzzy modal logic framework offers the possibility of incorporating human-defined knowledge in the inference process. After defining the model, we describe a method to incorporate a human-defined thesaurus into inference by taking user relevance feedback into consideration. Experiments on the CACM corpus using a general thesaurus of English, Wordnet, indicate a significant improvement in the system's performance.

[1]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[2]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[3]  S. Miyamoto Information retrieval based on fuzzy associations , 1990 .

[4]  Micheline Hancock-Beaulieu,et al.  An Evaluation of Automatic Query Expansion in an Online Library Catalogue , 1992, J. Documentation.

[5]  L. Zadeh The role of fuzzy logic in the management of uncertainty in expert systems , 1983 .

[6]  Jian-Yun Nie,et al.  An information retrieval model based on modal logic , 1989, Inf. Process. Manag..

[7]  Karen Spärck Jones Notes and references on early automatic classification work , 1991, SIGF.

[8]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[9]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[10]  Tadeusz Radecki,et al.  Fuzzy set theoretical approach to document retrieval , 1979, Inf. Process. Manag..

[11]  U. GijNTzER AUTOMATIC THESAURUS CONSTRUCTION BY MACHINE LEARNING FROM RETRIEVAL SESSIONS , 2002 .

[12]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[13]  Roy Rada,et al.  Document Ranking using an Enriched Thesaurus , 1991, J. Documentation.

[14]  Xin Lu Document retrieval: A structural approach , 1990, Inf. Process. Manag..

[15]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[16]  K. J. Lynch,et al.  Generating, integrating, and activating thesauri for concept-based document retrieval , 1993, IEEE Expert.

[17]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[18]  Edward A. Fox,et al.  Lexical relations: enhancing effectiveness of information retrieval systems , 1980, SIGF.

[19]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[20]  Paul Thompson,et al.  Subjective Probability and Information Retrieval: a Review of the Psychological literature , 1988, J. Documentation.

[21]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[22]  Ying Ming-Sheng,et al.  On standard models of fuzzy modal logics , 1988 .

[23]  D. Dubois,et al.  FUZZY LOGICS AND THE GENERALIZED MODUS PONENS REVISITED , 1984 .

[24]  Ulrich Güntzer,et al.  Automatic thesaurus construction by machine learning from retrieval sessions , 1989, Inf. Process. Manag..

[25]  Myoung-Ho Kim,et al.  Ranking Documents in Thesaurus-Based Boolean Retrieval Systems , 1994, Inf. Process. Manag..

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[27]  Donald H. Kraft,et al.  A mathematical model of a weighted boolean retrieval system , 1979, Inf. Process. Manag..

[28]  W. Bruce Croft Approaches to Intelligent Information Retrieval , 1987, Inf. Process. Manag..

[29]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.

[30]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[31]  Donald Hindle,et al.  Acquiring Disambiguation Rules from Text , 1989, ACL.

[32]  W. Pedrycz,et al.  A fuzzy extension of Saaty's priority theory , 1983 .

[33]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[34]  Abraham Bookstein,et al.  Outline of a General Probabilistic Retrieval Model , 1983, J. Documentation.

[35]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[36]  Duncan A. Buell,et al.  An analysis of some fuzzy subset applications to information retrieval systems , 1982 .

[37]  Jin H. Kim,et al.  A Model of Knowledge Based Information Retrieval with Hierarchical Concept Graph , 1990, J. Documentation.

[38]  Peter Willett,et al.  The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems , 1991 .

[39]  Hsinchun Chen,et al.  Cognitive process as a basis for intelligent retrieval systems design , 1991, Inf. Process. Manag..

[40]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[41]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[42]  Haruo Kimoto,et al.  Construction of a dynamic Thesaurus and its use for associated information retrieval , 1989, SIGIR '90.

[43]  George Epstein Proceedings of the 1975 International Symposium on Multiple-Valued Logic, Indiana University, Bloomington, Indiana, May 13-16, 1975. , 1975 .

[44]  Donald H. Kraft,et al.  A model for a weighted retrieval system , 1981, J. Am. Soc. Inf. Sci..

[45]  C. J. van Rijsbergen,et al.  Towards an information logic , 1989, SIGIR '89.

[46]  Yiyu Yao,et al.  A probabilistic inference model for information retrieval , 1991, Inf. Syst..

[47]  Brian F. Chellas Modal Logic: Normal systems of modal logic , 1980 .

[48]  William S. Cooper,et al.  Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.

[49]  Jian-Yun Nie,et al.  A retrieval model based on an extended modal logic and its application to the RIME experimental approach , 1989, SIGIR '90.

[50]  Donald H. Kraft,et al.  Fuzzy Sets and Generalized Boolean Retrieval Systems , 1983, Int. J. Man Mach. Stud..

[51]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..