ClausIE: clause-based open information extraction

We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

[1]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[2]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[3]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[4]  A. Akbik,et al.  Wanderlust : Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns , 2009 .

[5]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[6]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[7]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[8]  Oren Etzioni,et al.  Semantic Role Labeling for Open Information Extraction , 2010, HLT-NAACL 2010.

[9]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[10]  Richard J. Evans,et al.  Comparing methods for the syntactic simplification of sentences in information extraction , 2011, Literary and Linguistic Computing.

[11]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[12]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[13]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[14]  Amal Zouaq,et al.  An Overview of Shallow and Deep Natural Language Processing for Ontology Learning , 2011 .

[15]  Alexander Löser,et al.  KrakeN: N-ary Facts in Open Information Extraction , 2012, AKBC-WEKEX@NAACL-HLT.

[16]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[17]  Pablo Gamallo,et al.  Dependency-Based Open Information Extraction , 2012 .

[18]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[19]  Oren Etzioni,et al.  No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities , 2012, EMNLP.

[20]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .