Can We Predict New Facts with Open Knowledge Graph Embeddings? A Benchmark for Open Link Prediction

Open Information Extraction systems extract(“subject text”, “relation text”, “object text”)triples from raw text. Some triples are textualversions of facts, i.e., non-canonicalized men-tions of entities and relations. In this paper, weinvestigate whether it is possible to infernewfacts directly from theopen knowledge graphwithout any canonicalization or any supervi-sion from curated knowledge. For this pur-pose, we propose the open link prediction task,i.e., predicting test facts by completing(“sub-ject text”, “relation text”, ?)questions. Anevaluation in such a setup raises the question ifa correct prediction is actually anewfact thatwas induced by reasoning over the open knowl-edge graph or if it can be trivially explained.For example, facts can appear in different para-phrased textual variants, which can lead to testleakage. To this end, we propose an evaluationprotocol and a methodology for creating theopen link prediction benchmark OLPBENCH.We performed experiments with a prototypicalknowledge graph embedding model for openlink prediction. While the task is very chal-lenging, our results suggests that it is possibleto predict genuinely new facts, which can notbe trivially explained.

[1]  Luciano Del Corro,et al.  MinIE: Minimizing Facts in Open Information Extraction , 2017, EMNLP.

[2]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Sebastian Riedel,et al.  Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[5]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[6]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[7]  Rainer Gemulla,et al.  OPIEC: An Open Information Extraction Corpus , 2019, AKBC.

[8]  Fabian M. Suchanek,et al.  Canonicalizing Open Knowledge Bases , 2014, CIKM.

[9]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[10]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[11]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[12]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[13]  Partha Talukdar,et al.  CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information , 2018, WWW.

[14]  Yuji Matsumoto,et al.  Knowledge Transfer for Out-of-Knowledge-Base Entities: A Graph Neural Network Approach , 2017, ArXiv.

[15]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[16]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[17]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[18]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[19]  Tim Weninger,et al.  Open-World Knowledge Graph Completion , 2017, AAAI.

[20]  Vít Novácek,et al.  Drug target discovery using knowledge graph embeddings , 2019, SAC.

[21]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[22]  Henry A. Kautz,et al.  Hardening soft information sources , 2000, KDD '00.

[23]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[24]  Andrew McCallum,et al.  Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema , 2016, EACL.

[25]  Zhiyong Wu,et al.  Towards Practical Open Knowledge Base Canonicalization , 2018, CIKM.

[26]  Fabio Petroni,et al.  CORE: Context-Aware Open Relation Extraction with Factorization Machines , 2015, EMNLP.