Open Language Learning for Information Extraction

Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, state-of-the-art Open IE systems such as ReVerb and woe share two important weaknesses -- (1) they extract only relations that are mediated by verbs, and (2) they ignore context, thus extracting tuples that are not asserted as factual. This paper presents ollie, a substantially improved Open IE system that addresses both these limitations. First, ollie achieves high yield by extracting relations mediated by nouns, adjectives, and more. Second, a context-analysis step increases precision by including contextual information from the sentence in the extractions. ollie obtains 2.7 times the area under precision-yield curve (AUC) compared to ReVerb and 1.9 times the AUC of woeparse.

[1]  Gerhard Weikum,et al.  Combining linguistic and statistical analysis to extract relations from web documents , 2006, KDD '06.

[2]  Richard Johansson,et al.  The Effect of Syntactic Representation on Semantic Role Labeling , 2008, COLING.

[3]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[4]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[5]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[6]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[7]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[8]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[9]  Oren Etzioni,et al.  An analysis of open information extraction based on semantic role labeling , 2011, K-CAP '11.

[10]  Gang Wang,et al.  PORE: Positive-Only Relation Extraction from Wikipedia Text , 2007, ISWC/ASWC.

[11]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[12]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[13]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[14]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[15]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[16]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[17]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[18]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[19]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[20]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[21]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[22]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[23]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[24]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[25]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[26]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[27]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[28]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[29]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[30]  Ralph Grishman,et al.  Annotating Noun Argument Structure for NomBank , 2004, LREC.

[31]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.