NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding

Knowledge graph (KG) embedding is a fundamental problem in data mining research with many real-world applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and harder to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve good balance between exploration and exploitation. In this way, our method acts as a "distilled" version of previous GAN-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement on various KG embedding models, and outperform the state-of-the-arts negative sampling methods based on GAN.

[1]  Heng Huang,et al.  Self-Paced Network Embedding , 2018, KDD.

[2]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[3]  Jun Zhao,et al.  Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[4]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[7]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[8]  Yiming Yang,et al.  Analogical Inference for Multi-relational Embeddings , 2017, ICML.

[9]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[10]  Lars Schmidt-Thieme,et al.  Predicting RDF triples in incomplete knowledge bases with tensor factorization , 2012, SAC '12.

[11]  Douglas A. White The Knowledge-based Software Assistant: A Program Summary , 1991, Proceedings., 6th Annual Knowledge-Based Software Engineering Conference.

[12]  Miao Fan,et al.  Transition-based Knowledge Graph Embedding with Relational Mapping Properties , 2014, PACLIC.

[14]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[15]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[18]  Danqi Chen,et al.  Observed versus latent features for knowledge base and text inference , 2015, CVSC.

[19]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[22]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[23]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[24]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[25]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[26]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[27]  William Yang Wang,et al.  KBGAN: Adversarial Learning for Knowledge Graph Embeddings , 2017, NAACL.

[28]  Rong Pan,et al.  Incorporating GAN for Negative Sampling in Knowledge Representation Learning , 2018, AAAI.

[29]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[31]  Han Xiao,et al.  From One Point to a Manifold: Knowledge Graph Embedding for Precise Link Prediction , 2015, IJCAI.

[32]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[33]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[34]  Jason Weston,et al.  Open Question Answering with Weakly Supervised Embedding Models , 2014, ECML/PKDD.

[35]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[36]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[37]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[38]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[39]  Jun Zhao,et al.  Knowledge Graph Completion with Adaptive Sparse Transfer Matrix , 2016, AAAI.

[40]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[41]  Rainer Gemulla,et al.  On Evaluating Embedding Models for Knowledge Base Completion , 2018, RepL4NLP@ACL.

[42]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[43]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[44]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[45]  Guillaume Bouchard,et al.  Knowledge Graph Completion via Complex Tensor Factorization , 2017, J. Mach. Learn. Res..

[46]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[47]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[48]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[49]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.