What is Learned in Knowledge Graph Embeddings?

A knowledge graph (KG) is a data structure which represents entities and relations as the vertices and edges of a directed graph with edge types. KGs are an important primitive in modern machine learning and artificial intelligence. Embedding-based models, such as the seminal TransE [Bordes et al., 2013] and the recent PairRE [Chao et al., 2020] are among the most popular and successful approaches for representing KGs and inferring missing edges (link completion). Their relative success is often credited in the literature to their ability to learn logical rules between the relations. In this work, we investigate whether learning rules between relations is indeed what drives the performance of embedding-based methods. We define motif learning and two alternative mechanisms, network learning (based only on the connectivity of the KG, ignoring the relation types), and unstructured statistical learning (ignoring the connectivity of the graph). Using experiments on synthetic KGs, we show that KG models can learn motifs and how this ability is degraded by non-motif (noise) edges. We propose tests to distinguish the contributions of the three mechanisms to performance, and apply them to popular KG benchmarks. We also discuss an issue with the standard performance testing protocol and suggest an improvement. 7

[1]  Gerhard Weikum,et al.  YAGO 4: A Reason-able Knowledge Base , 2020, ESWC.

[2]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[3]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[4]  Zhiyuan Liu,et al.  OpenKE: An Open Toolkit for Knowledge Embedding , 2018, EMNLP.

[5]  Jure Leskovec,et al.  Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings , 2020, ICLR.

[6]  Sanjoy Dasgupta,et al.  What relations are reliably embeddable in Euclidean space? , 2019, ALT.

[7]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[8]  Chengkai Li,et al.  Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study , 2020, SIGMOD Conference.

[9]  Rudolf Kadlec,et al.  Knowledge Base Completion: Baselines Strike Back , 2017, Rep4NLP@ACL.

[10]  J. Heinonen Lectures on Analysis on Metric Spaces , 2000 .

[11]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[12]  Timothy M. Hospedales,et al.  TuckER: Tensor Factorization for Knowledge Graph Completion , 2019, EMNLP.

[13]  Taifeng Wang,et al.  PairRE: Knowledge Graph Embeddings via Paired Relation Vectors , 2020, ACL.

[14]  Mausam,et al.  Knowledge Base Completion: Baseline strikes back (Again) , 2020, ArXiv.

[15]  Pouya Pezeshkpour,et al.  Revisiting Evaluation of Knowledge Base Completion Models , 2020, AKBC.

[16]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[17]  Thomas Lukasiewicz,et al.  BoxE: A Box Embedding Model for Knowledge Base Completion , 2020, NeurIPS.

[18]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[19]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[20]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[21]  Jian-Yun Nie,et al.  RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , 2018, ICLR.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Aditya Sharma,et al.  Towards Understanding the Geometry of Knowledge Graph Embeddings , 2018, ACL.

[24]  Timothy M. Hospedales,et al.  Interpreting Knowledge Graph Relation Representation from Word Embeddings , 2021, ICLR.