Revisiting Evaluation of Knowledge Base Completion Models

Representing knowledge graphs (KGs) by learning embeddings for entities and relations has led to accurate models for existing KG completion benchmarks. However, due to the open-world assumption of existing KGs, evaluation of KG completion uses ranking metrics and triple classification with negative samples, and is thus unable to directly assess models on the goals of the task: completion. In this paper, we first study the shortcomings of these evaluation metrics. Specifically, we demonstrate that these metrics (1) are unreliable for estimating how calibrated the models are, (2) make strong assumptions that are often violated, and 3) do not sufficiently, and consistently, differentiate embedding methods from each other, or from simpler approaches. To address these issues, we gather a semi-complete KG referred as YAGO3-TC, using a random subgraph from the test and validation data of YAGO3-10, which enables us to compute accurate triple classification accuracy on this data. Conducting thorough experiments on existing models, we provide new insights and directions for the KG completion research. Along with the dataset and the open source implementation of the models, we also provide a leaderboard for knowledge graph completion that consists of a hidden, and growing, test set, available at https://pouyapez.github.io/yago3-tc/.

[1]  Hoifung Poon,et al.  Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text , 2016, ACL.

[2]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[3]  Yiming Yang,et al.  A Re-evaluation of Knowledge Graph Completion Methods , 2019, ACL.

[4]  Luca Costabello,et al.  Probability Calibration for Knowledge Graph Embedding Models , 2020, ICLR.

[5]  Rudolf Kadlec,et al.  Knowledge Base Completion: Baselines Strike Back , 2017, Rep4NLP@ACL.

[6]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[7]  Dai Quoc Nguyen,et al.  A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network , 2017, NAACL.

[8]  Partha Talukdar,et al.  HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding , 2018, EMNLP.

[9]  Danai Koutra,et al.  Improving the Utility of Knowledge Graph Embeddings with Calibration , 2020, ArXiv.

[10]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[11]  Mathias Niepert,et al.  KBlrn: End-to-End Learning of Knowledge Base Representations with Latent, Relational, and Numerical Features , 2017, UAI.

[12]  Nicholas Jing Yuan,et al.  Collaborative Knowledge Base Embedding for Recommender Systems , 2016, KDD.

[13]  L. Getoor,et al.  Sparsity and Noise: Where Knowledge Graph Embeddings Fall Short , 2017, EMNLP.

[14]  Sameer Singh,et al.  Embedding Multimodal Relational Data for Knowledge Base Completion , 2018, EMNLP.

[15]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[16]  Daniel Oñoro-Rubio,et al.  Representation Learning for Visual-Relational Knowledge Graphs , 2017, ArXiv.

[17]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[18]  Mausam,et al.  Knowledge Base Completion: Baseline strikes back (Again) , 2020, ArXiv.

[19]  José M. F. Moura,et al.  CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog , 2019, NAACL.

[20]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[21]  Guillaume Bouchard,et al.  Knowledge Graph Completion via Complex Tensor Factorization , 2017, J. Mach. Learn. Res..

[22]  Aditya Sharma,et al.  Towards Understanding the Geometry of Knowledge Graph Embeddings , 2018, ACL.

[23]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[24]  Timothy M. Hospedales,et al.  TuckER: Tensor Factorization for Knowledge Graph Completion , 2019, EMNLP.

[25]  Jian-Yun Nie,et al.  RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , 2018, ICLR.

[26]  Zhiyuan Liu,et al.  CANE: Context-Aware Network Embedding for Relation Modeling , 2017, ACL.

[27]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[29]  Timothy M. Hospedales,et al.  On Understanding Knowledge Graph Representation , 2019, ArXiv.

[30]  Samuel R. Bowman,et al.  ListOps: A Diagnostic Dataset for Latent Tree Learning , 2018, NAACL.

[31]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[32]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[33]  Danqi Chen,et al.  Observed versus latent features for knowledge base and text inference , 2015, CVSC.

[34]  Rainer Gemulla,et al.  You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings , 2020, ICLR.

[35]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[36]  Fei Wang,et al.  Drug knowledge bases and their applications in biomedical informatics research , 2019, Briefings Bioinform..

[37]  Sameer Singh,et al.  Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications , 2018, NAACL.

[38]  Chengkai Li,et al.  Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study , 2020, SIGMOD Conference.