Reducing Wrong Labels in Distant Supervision for Relation Extraction

In relation extraction, distant supervision seeks to extract relations between entities from text by using a knowledge base, such as Freebase, as a source of supervision. When a sentence and a knowledge base refer to the same entity pair, this approach heuristically labels the sentence with the corresponding relation in the knowledge base. However, this heuristic can fail with the result that some sentences are labeled wrongly. This noisy labeled data causes poor extraction performance. In this paper, we propose a method to reduce the number of wrong labels. We present a novel generative model that directly models the heuristic labeling process of distant supervision. The model predicts whether assigned labels are correct or wrong via its hidden variables. Our experimental results show that this model detected wrong labels with higher performance than baseline methods. In the experiment, we also found that our wrong label reduction boosted the performance of relation extraction.

[1]  Chang Wang,et al.  Relation Extraction with Relation Topics , 2011, EMNLP.

[2]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[3]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[4]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[5]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[6]  Yuji Matsumoto,et al.  Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms , 2008, EMNLP.

[7]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[8]  Gideon S. Mann,et al.  Putting Semantic Information Extraction on the Map : Noisy Label Models for Fact Extraction , 2007 .

[9]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[10]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[11]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[12]  Naoaki Okazaki,et al.  Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web , 2009, ACL.

[13]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[14]  Andrew McCallum,et al.  Learning Extractors from Unlabeled Text using Relevant Databases , 2007 .

[15]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[16]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[17]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[18]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[19]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[20]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.