Out-of-Vocabulary Entities in Link Prediction

Knowledge graph embedding techniques are key to making knowledge graphs amenable to the plethora of machine learning approaches based on vector representations. Link prediction is often used as a proxy to evaluate the quality of these embeddings. Given that the creation of benchmarks for link prediction is a time-consuming endeavor, most work on the subject matter uses only a few benchmarks. As benchmarks are crucial for the fair comparison of algorithms, ensuring their quality is tantamount to providing a solid ground for developing better solutions to link prediction and ipso facto embedding knowledge graphs. First studies of benchmarks pointed to limitations pertaining to information leaking from the development to the test fragments of some benchmark datasets. We spotted a further common limitation of three of the benchmarks commonly used for evaluating link prediction approaches: out-of-vocabulary entities in the test and validation sets. We provide an implementation of an approach for spotting and removing such entities and provide corrected versions of the datasets WN18RR, FB15K-237, and YAGO310. Our experiments on the corrected versions of WN18RR, FB15K237, and YAGO3-10 suggest that the measured performance of state-of-the-art approaches is altered significantly with p-values < 1%, < 1.4%, and < 1%, respectively. Overall, state-of-the-art approaches gain on average absolute 3.29 ± 0.24% in all metrics on WN18RR. This means that some of the conclusions achieved in previous works might need to be revisited. We provide an opensource implementation of our experiments and corrected datasets at https://github.com/dice-group/OOV-In-Link-Prediction.

[1]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[2]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[3]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[4]  Timothy M. Hospedales,et al.  TuckER: Tensor Factorization for Knowledge Graph Completion , 2019, EMNLP.

[5]  Axel-Cyrille Ngonga Ngomo,et al.  Convolutional Complex Knowledge Graph Embeddings , 2021, ESWC.

[6]  Philip S. Yu,et al.  A Survey on Knowledge Graphs: Representation, Acquisition, and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Timothy M. Hospedales,et al.  Multi-relational Poincaré Graph Embeddings , 2019, NeurIPS.

[8]  Timothy M. Hospedales,et al.  Hypernetwork Knowledge Graph Embeddings , 2018, ICANN.

[9]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[10]  Rainer Gemulla,et al.  You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings , 2020, ICLR.

[11]  Muhammad Saleem,et al.  Big linked cancer data: Integrating linked TCGA and PubMed , 2014, J. Web Semant..

[12]  Lina Yao,et al.  Quaternion Knowledge Graph Embeddings , 2019, NeurIPS.

[13]  Jian-Yun Nie,et al.  RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , 2018, ICLR.

[14]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[15]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[16]  Axel-Cyrille Ngonga Ngomo,et al.  A shallow neural model for relation prediction , 2021, 2021 IEEE 15th International Conference on Semantic Computing (ICSC).

[17]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[18]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[19]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[20]  Jingyuan Zhang,et al.  Knowledge Graph Embedding Based Question Answering , 2019, WSDM.

[21]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[22]  Axel-Cyrille Ngonga Ngomo,et al.  A Physical Embedding Model for Knowledge Graphs , 2020, ArXiv.

[23]  Xing Xie,et al.  A Survey on Knowledge Graph-Based Recommender Systems , 2020, IEEE Transactions on Knowledge and Data Engineering.

[24]  Rainer Gemulla,et al.  LibKGE - A knowledge graph embedding library for reproducible research , 2020, EMNLP.

[25]  Maximilian Nickel,et al.  Complex and Holographic Embeddings of Knowledge Graphs: A Comparison , 2017, ArXiv.

[26]  Markus Krötzsch,et al.  Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph , 2018, SEMWEB.

[27]  Volker Tresp,et al.  Type-Constrained Representation Learning in Knowledge Graphs , 2015, SEMWEB.

[28]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[29]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[30]  Guillaume Bouchard,et al.  Knowledge Graph Completion via Complex Tensor Factorization , 2017, J. Mach. Learn. Res..

[31]  Dai Quoc Nguyen,et al.  A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network , 2017, NAACL.

[32]  Danqi Chen,et al.  Observed versus latent features for knowledge base and text inference , 2015, CVSC.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.