CoRI: Collective Relation Integration with Data Augmentation for Open Information Extraction

Integrating extracted knowledge from the Web to knowledge graphs (KGs) can facilitate tasks like question answering. We study relation integration that aims to align free-text relations in subject-relation-object extractions to relations in a target KG. To address the challenge that free-text relations are ambiguous, previous methods exploit neighbor entities and relations for additional context. However, the predictions are made independently, which can be mutually inconsistent. We propose a two-stage Collective Relation Integration (CoRI) model, where the first stage independently makes candidate predictions, and the second stage employs a collective model that accesses all candidate predictions to make globally coherent predictions. We further improve the collective model with augmented data from the portion of the target KG that is otherwise unused. Experiment results on two datasets show that CoRI can significantly outperform the baselines, improving AUC from .677 to .748 and from .716 to .780, respectively.

[1]  Renée J. Miller,et al.  A Collective, Probabilistic Approach to Schema Mapping , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[2]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[3]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[4]  Partha Talukdar,et al.  CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information , 2018, WWW.

[5]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[6]  Oren Etzioni,et al.  Unsupervised Methods for Determining Object and Relation Synonyms on the Web , 2014, J. Artif. Intell. Res..

[7]  Andrew McCallum,et al.  Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema , 2016, EACL.

[8]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[9]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[10]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[13]  Oren Etzioni,et al.  Open Information Extraction to KBP Relations in 3 Hours , 2013, TAC.

[14]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[15]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[16]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[17]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[18]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[19]  Jason Weston,et al.  Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction , 2013, EMNLP.

[20]  Vishrav Chaudhary,et al.  Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[21]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[22]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[23]  Graham Neubig,et al.  Incorporating External Knowledge through Pre-training for Natural Language to Code Generation , 2020, ACL.

[24]  Satoshi Sekine,et al.  On-Demand Information Extraction , 2006, ACL.

[25]  Andrew McCallum,et al.  Multilingual Relation Extraction using Compositional Universal Schema , 2015, NAACL.

[26]  Andrew McCallum,et al.  OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference , 2019, NAACL-HLT.

[27]  Heng Ji,et al.  Open Relation Extraction and Grounding , 2017, IJCNLP.

[28]  Fabian M. Suchanek,et al.  Canonicalizing Open Knowledge Bases , 2014, CIKM.

[29]  Ralph Grishman,et al.  Ensemble Semantics for Large-scale Unsupervised Relation Extraction , 2012, EMNLP.

[30]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[31]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[32]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[33]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[34]  Partha Talukdar,et al.  CaRe: Open Knowledge Graph Embeddings , 2019, EMNLP.