Using cause-effect relations in text to improve information retrieval precision

Abstract This study attempted to use semantic relations expressed in text, in particular cause-effect relations, to improve information retrieval effectiveness. The study investigated whether the information obtained by matching cause-effect relations expressed in documents with the cause-effect relations expressed in users’ queries can be used to improve document retrieval results, in comparison to using just keyword matching without considering relations. An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed. Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query. Causal relation matching did not perform better than word proximity matching (i.e. matching pairs of causally related words in the query with pairs of words that co-occur within document sentences), but the best results were obtained when causal relation matching was combined with word proximity matching. The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any word.

[1]  Lori S. Levin,et al.  Papers in lexical-functional grammar , 1983 .

[2]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[3]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[4]  Maria Teresa Pazienza,et al.  CoDHIR: an information retrieval system based on semantic document representation , 1994, J. Inf. Sci..

[5]  Donna Harman,et al.  Information Processing and Management , 2022 .

[6]  Lisa F. Rau,et al.  Information extraction and text summarization using linguistic knowledge acquisition , 1989, Inf. Process. Manag..

[7]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[8]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[9]  Donna K. Harman,et al.  The DARPA TIPSTER project , 1992, SIGF.

[10]  Joel L. Fagan,et al.  The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval , 1989, JASIS.

[11]  Joel L. Fagan The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval , 1989 .

[12]  Tomek Strzalkowski,et al.  Natural Language Information Retrieval: TREC-8 Report , 1994, TREC.

[13]  Alan F. Smeaton,et al.  Experiments on incorporating syntactic processing of user queries into a document retrieval strategy , 1988, SIGIR '88.

[14]  Alan F. Smeaton,et al.  Indexing Structures Derived from Syntax in TREC-3: System Description , 1994, TREC.

[15]  Michael L. Mauldin,et al.  Retrieval performance in Ferret a conceptual information retrieval system , 1991, SIGIR '91.

[16]  B. Altenberg CAUSAL LINKING IN SPOKEN AND WRITTEN ENGLISH , 1984 .

[17]  W. Bruce Croft Boolean queries and term dependencies in probabilistic retrieval models , 1986, J. Am. Soc. Inf. Sci..

[18]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part II. Results of a Design Study , 1982, J. Documentation.

[19]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[20]  Fujio Nishida,et al.  Structured-information extraction from patent-claim sentences , 1982, Inf. Process. Manag..

[21]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[22]  Christopher S. G. Khoo,et al.  Linguistic Processing of Text for Large-Scale Conceptual Information Retrieval System , 1994, ICCS.

[23]  Lisa F. Rau,et al.  Knowledge organization and access in a conceptual information system , 1987, Inf. Process. Manag..

[24]  Yasuaki Hyoudo,et al.  Comparison between proximity operation and dependency operation in Japanese full-text retrieval , 1998, SIGIR '98.

[25]  Catherine Berrut Indexing medical reports: The rime approach , 1990, Inf. Process. Manag..

[26]  Peter Mark Roget,et al.  Roget's International Thesaurus , 1977 .

[27]  Christopher S. G. Khoo,et al.  Automatic Extraction of Cause-Effect Information from Newspaper Text Without Knowledge-based Inferencing , 1998 .

[28]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[29]  Christopher S. G. Khoo Automatic identification of causal relations in text and their use for improving precision in information retrieval , 1996 .

[30]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[31]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[32]  Emmon W. Bach,et al.  Universals in Linguistic Theory , 1970 .

[33]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[34]  Sung-Hyon Myaeng,et al.  DR-LINK: A System Update for TREC-2 , 1993, TREC.

[35]  Martin Dillon,et al.  FASIT: A fully automatic syntactically based indexing system , 1983, J. Am. Soc. Inf. Sci..

[36]  Xin Lu On application of case relations to document retrieval , 1992 .

[37]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[38]  Geoffrey Z. Liu Semantic Vector Space Model: Implementation and Evaluation , 1997, J. Am. Soc. Inf. Sci..

[39]  Charles J. Fillmore,et al.  THE CASE FOR CASE. , 1967 .

[40]  Sung-Hyon Myaeng,et al.  TIPSTER Panel - DR-LINK's Linguistic-Conceptual Approach to Document Detection , 1992, TREC.