Distant Supervision for Relation Extraction beyond the Sentence Boundary

The growing demand for structured knowledge has led to great interest in relation extraction, especially in cases with limited supervision. However, existing distance supervision approaches only extract relations expressed in single sentences. In general, cross-sentence relation extraction is under-explored, even in the supervised-learning setting. In this paper, we propose the first approach for applying distant supervision to cross-sentence relation extraction. At the core of our approach is a graph representation that can incorporate both standard dependencies and discourse relations, thus providing a unifying way to model relations within and across sentences. We extract features from multiple paths in this graph, increasing accuracy and robustness when confronted with linguistic variation and analysis error. Experiments on an important extraction task for precision medicine show that our approach can learn an accurate cross-sentence extractor, using only a small existing knowledge base and unlabeled text from biomedical research articles. Compared to the existing distant supervision paradigm, our approach extracted twice as many relations at similar precision, thus demonstrating the prevalence of cross-sentence relations and the promise of our approach.

[1]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[2]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[3]  The Theory and Practice of Discourse Parsing and Summarization , 2000 .

[4]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[5]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[6]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[7]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[8]  Dan Klein,et al.  Unsupervised Coreference Resolution in a Nonparametric Bayesian Model , 2007, ACL.

[9]  Dragomir R. Radev,et al.  Networks and Natural Language Processing , 2008, AI Mag..

[10]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[11]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[12]  Vincent Ng,et al.  Unsupervised Models for Coreference Resolution , 2008, EMNLP.

[13]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[14]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[15]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[16]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[17]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[18]  Vincent Ng,et al.  Supervised Models for Coreference Resolution , 2009, EMNLP.

[19]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[20]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[21]  Hoifung Poon,et al.  Joint Inference for Knowledge Extraction from Biomedical Literature , 2010, NAACL.

[22]  Andrew McCallum,et al.  Fast and Robust Joint Models for Biomedical Event Extraction , 2011, EMNLP.

[23]  Noah A. Smith,et al.  Semi-Supervised Frame-Semantic Parsing for Unknown Predicates , 2011, ACL.

[24]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[25]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[26]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[27]  Mark Stevenson,et al.  Extracting Relations Within and Across Sentences , 2011, RANLP.

[28]  Jianfeng Gao,et al.  MSR SPLAT, a language analysis toolkit , 2012, HLT-NAACL.

[29]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[30]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[31]  Christopher Potts,et al.  The Life and Death of Discourse Entities: Identifying Singleton Mentions , 2013, NAACL.

[32]  Jackie Chi Kit Cheung,et al.  Probabilistic Frame Induction , 2013, NAACL.

[33]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[34]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[35]  Luke S. Zettlemoyer,et al.  Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves , 2013, EMNLP.

[36]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[37]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[38]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[39]  M. Surdeanu,et al.  Overview of the English Slot Filling Track at the TAC 2014 Knowledge Base Population Evaluation , 2014 .

[40]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[41]  Daniel S. Weld,et al.  Type-Aware Distantly Supervised Relation Extraction with Linked Arguments , 2014, EMNLP.

[42]  Hoifung Poon,et al.  Literome: PubMed-scale genomic knowledge base in the cloud , 2014, Bioinform..

[43]  Hoifung Poon,et al.  Distant Supervision for Cancer Pathway Extraction from Text , 2014, Pacific Symposium on Biocomputing.

[44]  Tom M. Mitchell,et al.  Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction , 2015, EMNLP.

[45]  S. Friend,et al.  Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. , 2015, Cancer discovery.

[46]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[47]  Hwee Tou Ng,et al.  The CoNLL-2015 Shared Task on Shallow Discourse Parsing , 2015, CoNLL.

[48]  Man Lan,et al.  A Refined End-to-End Discourse Parser , 2015, CoNLL Shared Task.

[49]  Hoifung Poon,et al.  Grounded Semantic Parsing for Complex Knowledge Extraction , 2015, NAACL.

[50]  Christopher D. Manning,et al.  Entity-Centric Coreference Resolution with Model Stacking , 2015, ACL.

[51]  Peter Jansen,et al.  Spinning Straw into Gold: Using Free Text to Train Monolingual Alignment Models for Non-factoid Question Answering , 2015, HLT-NAACL.

[52]  Mihai Surdeanu,et al.  Two Practical Rhetorical Structure Theory Parsers , 2015, NAACL.

[53]  Mihai Surdeanu,et al.  An investigation of coreference phenomena in the biomedical domain , 2016, LREC 2016.

[54]  Zhoujun Li,et al.  Aggregating Inter-Sentence Information to Enhance Relation Extraction , 2016, AAAI.

[55]  Hoifung Poon,et al.  Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text , 2016, ACL.

[56]  Isabelle Augenstein,et al.  Distantly supervised Web relation extraction for knowledge base population , 2016, Semantic Web.

[57]  I. König,et al.  What is precision medicine? , 2017, European Respiratory Journal.