Leveraging Linguistic Structure For Open Domain Information Extraction

Relation triples produced by open domain information extraction (open IE) systems are useful for question answering, inference, and other IE tasks. Traditionally these are extracted using a large set of patterns; however, this approach is brittle on out-of-domain text and long-range dependencies, and gives no insight into the substructure of the arguments. We replace this large pattern set with a few patterns for canonically structured sentences, and shift the focus to a classifier which learns to extract self-contained clauses from longer sentences. We then run natural logic inference over these short clauses to determine the maximally specific arguments for each candidate triple. We show that our approach outperforms a state-of-the-art open IE system on the end-to-end TAC-KBP 2013 Slot Filling task.

[1]  Victor Sanchez,et al.  Studies on Natural Logic and Categorial Grammar , 1991 .

[2]  Victor Manual Sánchez Valencia,et al.  Studies on natural logic and categorial grammar , 1991 .

[3]  B. Daille Approche mixte pour l'extraction de terminologie : statistique lexicale et filtres linguistiques , 1994 .

[4]  Stephen Glenn Soderland,et al.  Learning text analysis rules for domain-specific natural language processing , 1996 .

[5]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[6]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[7]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[8]  Ido Dagan,et al.  Investigating a Generic Paraphrase-Based Approach for Relation Extraction , 2006, EACL.

[9]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[10]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[11]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[12]  J. Benthem A brief history of natural logic , 2008 .

[13]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[14]  Christopher D. Manning,et al.  An extended model of natural logic , 2009, IWCS.

[15]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[16]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[17]  Oren Etzioni,et al.  Adapting Open Information Extraction to Domain-Specific Relations , 2010, AI Mag..

[18]  Ralph Grishman,et al.  New York University KBP 2010 Slot-Filling System , 2010, TAC.

[19]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[20]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[21]  Ido Dagan,et al.  Global Learning of Typed Entailment Rules , 2011, ACL.

[22]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[23]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[24]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[25]  Oren Etzioni Search needs a shake-up , 2011, Nature.

[26]  Yotaro Watanabe,et al.  A Latent Discriminative Model for Compositional Entailment Relation Recognition using Natural Logic , 2012, COLING.

[27]  Ralph Grishman,et al.  New York University 2012 System for KBP Slot Filling , 2012, TAC.

[28]  Andrew McCallum,et al.  Probabilistic Databases of Universal Schema , 2012, AKBC-WEKEX@NAACL-HLT.

[29]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[30]  Oren Etzioni,et al.  No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities , 2012, EMNLP.

[31]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[32]  Christopher D. Manning,et al.  Philosophers are Mortal: Inferring the Truth of Unseen Facts , 2013, CoNLL.

[33]  Denilson Barbosa,et al.  Effectiveness and Efficiency of Open Relation Extraction , 2013, EMNLP.

[34]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[35]  Mihai Surdeanu Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling , 2013, TAC.

[36]  Oren Etzioni,et al.  Open Information Extraction to KBP Relations in 3 Hours , 2013, TAC.

[37]  Yoav Goldberg,et al.  A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books , 2013, *SEMEVAL.

[38]  Christopher D. Manning,et al.  Combining Distant and Partial Supervision for Relation Extraction , 2014, EMNLP.

[39]  Christopher Potts,et al.  Recursive Neural Networks Can Learn Logical Semantics , 2014, CVSC.

[40]  Christopher D. Manning,et al.  NaturalLI: Natural Logic Inference for Common Sense Reasoning , 2014, EMNLP.

[41]  Christopher D. Manning,et al.  A Dictionary of Nonsubsective Adjectives , 2014 .

[42]  Thomas F. Icard III,et al.  Recent Progress on Monotonicity , 2014, LILT.

[43]  Joel Nothman,et al.  Analysing recall loss in named entity slot filling , 2014, EMNLP.

[44]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.