Heuristic Based Extraction of Causal Relations from Annotated Causal Cue Phrases

Heuristic Based Extraction of Causal Relations from Annotated Causal Cue Phrases By Matthew J. Hausknecht This work focuses on the detection and extraction of Causal Relations from open domain text starting with annotated Causal Cue Phrases (CCPs). It is argued that the problem of causality extraction should be decomposed into two distinct subtasks. First, it is necessary to identify Causal Cue Phrases (CCPs) inside of a body of text. Second, using these CCPs, the cause and effect phrases of each causal relation must be extracted. To prove that CCPs are an essential part of causality extraction, it is experimentally demonstrated that the accuracy of cause and effect phrase extraction dramatically increases when CCP knowledge is utilized. A 31% increase in accuracy of cause and effect phrase extraction of two equivalent CRF machine learning algorithms is found when simple, word-based knowledge of CCPs is taken into account. Furthermore, it is shown that cause and effect phrase extraction can be performed accurately and robustly without the aid of complex machine learning techniques. A simple, heuristic based extraction algorithm, centering around three distinct classes of CCPs, is introduced. This algorithm achieves an accuracy of 87% on the task of extracting cause and effect phrases. While the problem of identifying CCPs in open domain text is not addressed, it is hypothesized that this task is far easier than identifying cause and effect phrases alone because the space of all possible CCPs is far smaller than that of all causal relations. Finally, this work contributes a free, publicly accessible corpus explicitly annotated with both intra-sentential causal relations and corresponding Causal Cue Phrases. It is our hope that this resource may see future use as a standard corpus for the task of causality extraction. Heuristic Based Extraction of Causal Relations from Annotated Causal Cue Phrases

[1]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.

[2]  Ryuichiro Higashinaka,et al.  Automatically Acquiring Causal Expression Patterns from Relation-annotated Corpora to Improve Question Answering for why-Questions , 2008, TALIP.

[3]  Chaveevan Pechsiri,et al.  Mining Causality from Texts for Question Answering System , 2007, IEICE Trans. Inf. Syst..

[4]  Du-Seong Chang,et al.  Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities , 2006, Inf. Process. Manag..

[5]  Takashi Inui,et al.  Investigating the Characteristics of Causal Relations in Japanese Text , 2005, FCA@ACL.

[6]  Jerry R. Hobbs Toward a Useful Concept of Causality for Lexical Semantics , 2005, J. Semant..

[7]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[8]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[9]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[10]  Syin Chan,et al.  Extracting Causal Knowledge from a Medical Database Using Graphical Patterns , 2000, ACL.

[11]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[12]  Rashmi Prasad,et al.  Annotating Discourse Connectives and Their Arguments , 2004, FCP@NAACL-HLT.

[13]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[14]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[15]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .