Distant Supervision for Relation Extraction via Group Selection

Distant supervision DS aligns relations between name entities from a knowledge base KB with free text and automatically annotates the training corpus with relation mentions. One big challenge of DS is that the heuristically generated relation labels usually tend to be noisy, when a pair of entity has multiple and/or incomplete relations in a KB. This paper proposes two ranking-based methods to reduce noise and select effective training data for multi-instance multi-label learning MIML, one of the most popular learning paradigms for distantly supervised relation extraction. Through the proposed methods, training groups that are of low quality are excluded from the training data according to different ranking strategies. Experimental evaluation on the KBP dataset using state-of-the-art MIML algorithms in this community demonstrated that the proposed methods improved the performance significantly.

[1]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[2]  Gholamreza Haffari,et al.  Noisy Or-based model for Relation Extraction using Distant Supervision , 2014, EMNLP.

[3]  Le Zhao,et al.  Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction , 2013, ACL.

[4]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[5]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[6]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[7]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.

[8]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[9]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[10]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[11]  Eneko Agirre,et al.  Removing Noisy Mentions for Distant Supervision , 2013, Proces. del Leng. Natural.

[12]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[13]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[14]  Distant Supervision for Relation Extraction with Matrix Completion , 2014, ACL.

[15]  Christopher D. Manning,et al.  Combining Distant and Partial Supervision for Relation Extraction , 2014, EMNLP.

[16]  Oren Etzioni,et al.  Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[17]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[18]  Hiroshi Nakagawa,et al.  Reducing Wrong Labels in Distant Supervision for Relation Extraction , 2012, ACL.

[19]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[20]  Andrew McCallum,et al.  Learning Extractors from Unlabeled Text using Relevant Databases , 2007 .

[21]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.