Relation Extraction Using Distant Supervision

Relation extraction is a subtask of information extraction where semantic relationships are extracted from natural language text and then classified. In essence, it allows us to acquire structured knowledge from unstructured text. In this article, we present a survey of relation extraction methods that leverage pre-existing structured or semi-structured data to guide the extraction process. We introduce a taxonomy of existing methods and describe distant supervision approaches in detail. We describe, in addition, the evaluation methodologies and the datasets commonly used for quality assessment. Finally, we give a high-level outlook on the field, highlighting open problems as well as the most promising research directions.

[1]  Dietrich Klakow,et al.  Combining Generative and Discriminative Model Scores for Distant Supervision , 2013, EMNLP.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[6]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[9]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[10]  Dmitry Zelenko,et al.  Kernel methods for relation extraction , 2003 .

[11]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[12]  Kai-Wei Chang,et al.  Typed Tensor Decomposition of Knowledge Bases for Relation Extraction , 2014, EMNLP.

[13]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[14]  Holmer Hemsen,et al.  Freepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction , 2014, LREC.

[15]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[16]  Hiroshi Nakagawa,et al.  Reducing Wrong Labels in Distant Supervision for Relation Extraction , 2012, ACL.

[17]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[18]  Gerhard Weikum,et al.  HighLife: Higher-arity Fact Harvesting , 2018, WWW.

[19]  Hoifung Poon,et al.  Distant Supervision for Cancer Pathway Extraction from Text , 2014, Pacific Symposium on Biocomputing.

[20]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[21]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[22]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[23]  Li Li,et al.  A Survey on Relation Extraction , 2017, CCKS.

[24]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[25]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[26]  Le Zhao,et al.  Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction , 2013, ACL.

[27]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[28]  Enrique Alfonseca,et al.  Pattern Learning for Relation Extraction with a Hierarchical Topic Model , 2012, ACL.

[29]  Michael L. Wick,et al.  SampleRank : Learning Preferences from Atomic Gradients , 2009 .

[30]  Dietrich Klakow,et al.  Feature-based models for improving the quality of noisy training data for relation extraction , 2013, CIKM.

[31]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[32]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[33]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[34]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[35]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[36]  Christoph Boden,et al.  Exploratory Relation Extraction in Large Text Corpora , 2014, COLING.

[37]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[38]  Karl Aberer,et al.  TRank: Ranking Entity Types Using the Web of Data , 2013, International Semantic Web Conference.

[39]  Chang Wang,et al.  Relation Extraction with Relation Topics , 2011, EMNLP.

[40]  Ce Zhang,et al.  DeepDive: A Data Management System for Automatic Knowledge Base Construction , 2015 .

[41]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[42]  Xianpei Han,et al.  Global Distant Supervision for Relation Extraction , 2016, AAAI.

[43]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[44]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[45]  Sameer Singh,et al.  Injecting Logical Background Knowledge into Embeddings for Relation Extraction , 2015, NAACL.

[46]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[47]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[48]  Andrew McCallum,et al.  Structured Relation Discovery using Generative Models , 2011, EMNLP.

[49]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[50]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[51]  Christopher D. Manning,et al.  Combining Distant and Partial Supervision for Relation Extraction , 2014, EMNLP.

[52]  Daniel S. Weld,et al.  Type-Aware Distantly Supervised Relation Extraction with Linked Arguments , 2014, EMNLP.

[53]  Mark Steedman,et al.  Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , 2012 .

[54]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[55]  Ralph Grishman,et al.  New York University 2012 System for KBP Slot Filling , 2012, TAC.

[56]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[57]  Jason Weston,et al.  Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction , 2013, EMNLP.

[58]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[59]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[60]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[61]  Ralph Grishman,et al.  Infusion of Labeled Data into Distant Supervision for Relation Extraction , 2014, ACL.

[62]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[63]  Distant Supervision for Relation Extraction with Matrix Completion , 2014, ACL.

[64]  Oren Etzioni,et al.  Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[65]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[66]  Isabelle Augenstein,et al.  Distantly supervised Web relation extraction for knowledge base population , 2016, Semantic Web.

[67]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[68]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[69]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[70]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[71]  Dietrich Klakow,et al.  A survey of noise reduction methods for distant supervision , 2013, AKBC '13.

[72]  S. Friend,et al.  Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. , 2015, Cancer discovery.