Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings

Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions ($\wedge$) and existential quantifiers ($\exists$). Handling queries with logical disjunctions ($\vee$) remains an open problem. Here we propose query2box, an embedding-based framework for reasoning over arbitrary queries with $\wedge$, $\vee$, and $\exists$ operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportional to the number of KG entities. However, we show that by transforming queries into a Disjunctive Normal Form, query2box is capable of handling arbitrary logical queries with $\wedge$, $\vee$, $\exists$ in a scalable manner. We demonstrate the effectiveness of query2box on two large KGs and show that query2box achieves up to 25% relative improvement over the state of the art.

[1]  Xiang Li,et al.  Smoothing the Geometry of Probabilistic Box Embeddings , 2018, ICLR.

[2]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[3]  Timothy M. Hospedales,et al.  On Understanding Knowledge Graph Representation , 2019, ArXiv.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Alice Lai,et al.  Learning to Predict Denotational Probabilities For Modeling Entailment , 2017, EACL.

[6]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[7]  Wenhan Xiong,et al.  DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning , 2017, EMNLP.

[8]  Katrin Erk,et al.  Representing words as regions in vector space , 2009, CoNLL.

[9]  Jure Leskovec,et al.  Embedding Logical Queries on Knowledge Graphs , 2018, NeurIPS.

[10]  Jun Zhao,et al.  Learning to Represent Knowledge Graphs with Gaussian Embedding , 2015, CIKM.

[11]  Jian-Yun Nie,et al.  RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , 2018, ICLR.

[12]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  Xiang Li,et al.  Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures , 2018, ACL.

[15]  Saěso Dězeroski Relational Data Mining , 2001, Encyclopedia of Machine Learning and Data Mining.

[16]  Andrew Gordon Wilson,et al.  Hierarchical Density Order Embeddings , 2018, ICLR.

[17]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[18]  Luc De Raedt,et al.  Logical and relational learning , 2008, Cognitive Technologies.

[19]  Volker Tresp,et al.  Querying Factorized Probabilistic Triple Databases , 2014, SEMWEB.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Danqi Chen,et al.  Observed versus latent features for knowledge base and text inference , 2015, CVSC.

[22]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[23]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[24]  Rajarshi Das,et al.  Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks , 2016, EACL.

[25]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[26]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[27]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[28]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[29]  Xiang Li,et al.  Improved Representation Learning for Predicting Commonsense Ontologies , 2017, ArXiv.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[32]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[33]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.