Word2Box: Learning Word Representation Using Box Embeddings

Learning vector representations for words is one of the most fundamental topics in NLP, capable of capturing syntactic and semantic relationships useful in a variety of downstream NLP tasks. Vector representations can be limiting, however, in that typical scoring such as dot product similarity intertwines position and magnitude of the vector in space. Exciting innovations in the space of representation learning have proposed alternative fundamental representations, such as distributions, hyperbolic vectors, or regions. Our model, WORD2BOX, takes a region-based approach to the problem of word representation, representing words as n-dimensional rectangles. These representations encode position and breadth independently, and provide additional geometric operations such as intersection and containment which allow them to model co-occurrence patterns vectors struggle with. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a qualitative analysis exploring the additional unique expressivity provided by WORD2BOX.

[1]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[2]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[3]  David M. W. Powers,et al.  Verb similarity on the taxonomy of WordNet , 2006 .

[4]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[5]  Marc Peter Deisenroth,et al.  Neural Embeddings of Graphs in Hyperbolic Space , 2017, ArXiv.

[6]  Xiang Li,et al.  Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures , 2018, ACL.

[7]  Anna Korhonen,et al.  An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.

[8]  Thomas Hofmann,et al.  Hyperbolic Entailment Cones for Learning Hierarchical Embeddings , 2018, ICML.

[9]  Jure Leskovec,et al.  Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs , 2020, NeurIPS.

[10]  Jure Leskovec,et al.  Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings , 2020, ICLR.

[11]  Michael Boratko,et al.  Modeling Fine-Grained Entity Types with Box Embeddings , 2021, ACL.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Ryusuke Takahama,et al.  Hyperbolic Disk Embeddings for Directed Acyclic Graphs , 2019, ICML.

[14]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[15]  Andrew Gordon Wilson,et al.  Hierarchical Density Order Embeddings , 2018, ICLR.

[16]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[17]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[18]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[19]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[20]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[21]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[22]  Andrew McCallum,et al.  Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning , 2021, NAACL.

[23]  Xiang Li,et al.  Smoothing the Geometry of Probabilistic Box Embeddings , 2018, ICLR.

[24]  Thomas Lukasiewicz,et al.  BoxE: A Box Embedding Model for Knowledge Base Completion , 2020, NeurIPS.

[25]  Martha Lewis,et al.  Modelling Lexical Ambiguity with Density Matrices , 2020, CONLL.

[26]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[27]  Gary Bécigneul,et al.  Poincaré GloVe: Hyperbolic Word Embeddings , 2018, ICLR.

[28]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[29]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Alice Lai,et al.  Learning to Predict Denotational Probabilities For Modeling Entailment , 2017, EACL.

[32]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[33]  Michael Boratko,et al.  Improving Local Identifiability in Probabilistic Box Embeddings , 2020, NeurIPS.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[36]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.