Open Information Extraction Using Wikipedia

Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner's precision and recall. The key to WOE's performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.

[1]  Ari Rappoport,et al.  Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions , 2008, ACL.

[2]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[3]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[4]  Lenhart K. Schubert,et al.  Open Knowledge Extraction through Compositional Language Processing , 2008, STEP.

[5]  Ari Rappoport,et al.  Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining , 2007, ACL.

[6]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[7]  Marius Pasca,et al.  Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction , 2008, AAAI.

[8]  Takahiro Hara,et al.  Wikipedia Link Structure and Text Mining for Semantic Relation Extraction , 2008, SemSearch.

[9]  A. Akbik,et al.  Wanderlust : Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns , 2009 .

[10]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.

[11]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[12]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[13]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[14]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[15]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[16]  Mengqiu Wang,et al.  A Re-examination of Dependency Path Kernels for Relation Extraction , 2008, IJCNLP.

[17]  Jens Lehmann,et al.  What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content , 2007, ESWC.

[18]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[19]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[20]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[21]  Daniel S. Weld,et al.  Information extraction from Wikipedia: moving down the long tail , 2008, KDD.

[22]  Mitsuru Ishizuka,et al.  Exploiting Syntactic and Semantic Information for Relation Extraction from Wikipedia , 2006 .

[23]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[24]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[25]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[26]  Jian Su,et al.  A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features , 2006, ACL.

[27]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[28]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[29]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[30]  Aldo Gangemi,et al.  Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology , 2005, IJCAI.

[31]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[32]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.