Open Information Extraction with Global Structure Constraints

Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. However, current open IE systems ignore the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel open IE system, called ReMine, which integrates local context signal and global structural signal in a unified framework with distant supervision. The new system can be efficiently applied to different domains as it uses facts from external knowledge bases as supervision; and can effectively score sentence-level tuple extractions based on corpus-level statistics. Specifically, we design a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of sentence-level extractions with a translating-based objective. Experiments on real-world corpora from different domains demonstrate the effectiveness and robustness of ReMine when compared to other open IE systems.

[1]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[2]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[3]  Armen E. Allahverdyan,et al.  Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs , 2011, NIPS.

[4]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[5]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[6]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[7]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[8]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[9]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  Li Guo,et al.  Context-Dependent Knowledge Graph Embedding , 2015, EMNLP.

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[14]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[15]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[16]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[17]  Jiawei Han,et al.  Mining Quality Phrases from Massive Text Corpora , 2015, SIGMOD Conference.

[18]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[19]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[20]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[21]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[22]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[23]  Ming-Wei Chang,et al.  Open Domain Question Answering via Semantic Enrichment , 2015, WWW.

[24]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[25]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[26]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[27]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[28]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[29]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[30]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[31]  Denilson Barbosa,et al.  Open Information Extraction with Tree Kernels , 2013, NAACL.

[32]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[33]  Jiawei Han,et al.  Phrase Mining from Massive Text and Its Applications , 2017, Synthesis Lectures on Data Mining and Knowledge Discovery.

[34]  Sameer Singh,et al.  Injecting Logical Background Knowledge into Embeddings for Relation Extraction , 2015, NAACL.

[35]  Daniel S. Weld,et al.  Fine-Grained Entity Recognition , 2012, AAAI.

[36]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[37]  Pablo Gamallo,et al.  Dependency-Based Open Information Extraction , 2012 .

[38]  Luciano Del Corro,et al.  MinIE: Minimizing Facts in Open Information Extraction , 2017, EMNLP.

[39]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[40]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[41]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[42]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.