Improving Local Identifiability in Probabilistic Box Embeddings

Geometric embeddings have recently received attention for their natural ability to represent transitive asymmetric relations via containment. Box embeddings, where objects are represented by n-dimensional hyperrectangles, are a particularly promising example of such an embedding as they are closed under intersection and their volume can be calculated easily, allowing them to naturally represent calibrated probability distributions. The benefits of geometric embeddings also introduce a problem of local identifiability, however, where whole neighborhoods of parameters result in equivalent loss which impedes learning. Prior work addressed some of these issues by using an approximation to Gaussian convolution over the box parameters, however, this intersection operation also increases the sparsity of the gradient. In this work, we model the box parameters with min and max Gumbel distributions, which were chosen such that space is still closed under the operation of the intersection. The calculation of the expected intersection volume involves all parameters, and we demonstrate experimentally that this drastically improves the ability of such models to learn.

[1]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[2]  Andrew McCallum,et al.  Representing Joint Hierarchies with Box Embeddings , 2020, AKBC.

[3]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[4]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[5]  Sandeep Subramanian,et al.  New Embedded Representations and Evaluation Protocols for Inferring Transitive Relations , 2018, SIGIR.

[6]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[7]  Andrew Gordon Wilson,et al.  Hierarchical Density Order Embeddings , 2018, ICLR.

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[10]  Alice Lai,et al.  Learning to Predict Denotational Probabilities For Modeling Entailment , 2017, EACL.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[15]  Thomas Hofmann,et al.  Hyperbolic Entailment Cones for Learning Hierarchical Embeddings , 2018, ICML.

[16]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[17]  Jure Leskovec,et al.  Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings , 2020, ICLR.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Xiang Li,et al.  Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures , 2018, ACL.

[20]  Xiang Li,et al.  Smoothing the Geometry of Probabilistic Box Embeddings , 2018, ICLR.

[21]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[22]  Hossein Mobahi,et al.  Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.

[23]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.