Characterizing the Impact of Geometric Properties of Word Embeddings on Task Performance

Analysis of word embedding properties to inform their use in downstream NLP tasks has largely been studied by assessing nearest neighbors. However, geometric properties of the continuous feature space contribute directly to the use of embedding features in downstream models, and are largely unexplored. We consider four properties of word embedding geometry, namely: position relative to the origin, distribution of features in the vector space, global pairwise distances, and local pairwise distances. We define a sequence of transformations to generate new embeddings that expose subsets of these properties to downstream models and evaluate change in task performance to understand the contribution of each property to NLP models. We transform publicly available pretrained embeddings from three popular toolkits (word2vec, GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model linguistic information in the vector space, and extrinsic tasks, which use vectors as input to machine learning models. We find that intrinsic evaluations are highly sensitive to absolute position, while extrinsic tasks rely primarily on local similarity. Our findings suggest that future embedding models and post-processing techniques should focus primarily on similarity to nearby points in vector space.

[1]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Udo Hahn,et al.  Bad Company—Neighborhoods in Neural Embedding Spaces Considered Harmful , 2016, COLING.

[4]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[5]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[6]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[7]  Yulia Tsvetkov,et al.  Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.

[8]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[9]  Ludovic Tanguy,et al.  Towards Qualitative Word Embeddings Evaluation: Measuring Neighbors Variation , 2018, NAACL.

[10]  Yulia Tsvetkov,et al.  Sparse Overcomplete Word Vector Representations , 2015, ACL.

[11]  Eric Fosler-Lussier,et al.  Insights into Analogy Completion from the Biomedical Domain , 2017, BioNLP.

[12]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[13]  W. Montague,et al.  Category norms of verbal items in 56 categories A replication and extension of the Connecticut category norms , 1969 .

[14]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[15]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[16]  Anna Rumshisky,et al.  What’s in Your Embedding, And How It Predicts Task Performance , 2018, COLING.

[17]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[18]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[19]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[20]  Magnus Sahlgren,et al.  Navigating the Semantic Horizon using Relative Neighborhood Graphs , 2015, EMNLP.

[21]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[22]  Bofang Li,et al.  The (too Many) Problems of Analogical Reasoning with Word Vectors , 2017, *SEMEVAL.

[23]  Xiaoyong Du,et al.  Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings , 2017, EMNLP.

[24]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[25]  Leonidas J. Guibas,et al.  GRASS: Generative Recursive Autoencoders for Shape Structures , 2017, ACM Trans. Graph..

[26]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[27]  Florian Heimerl,et al.  Interactive Analysis of Word Vector Embeddings , 2018, Comput. Graph. Forum.

[28]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[29]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[32]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[33]  Eneko Agirre,et al.  Learning principled bilingual mappings of word embeddings while preserving monolingual invariance , 2016, EMNLP.

[34]  Weinan Zhang,et al.  Improving Negative Sampling for Word Representation using Self-embedded Features , 2017, WSDM.

[35]  Rada Mihalcea,et al.  Factors Influencing the Surprising Instability of Word Embeddings , 2018, NAACL.

[36]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[37]  Aditya Sharma,et al.  Towards Understanding the Geometry of Knowledge Graph Embeddings , 2018, ACL.

[38]  Laure Thompson,et al.  The strange geometry of skip-gram with negative sampling , 2017, EMNLP.

[39]  Tal Linzen,et al.  Issues in evaluating semantic spaces using word analogies , 2016, RepEval@ACL.

[40]  Sampo Pyysalo,et al.  Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance , 2016, RepEval@ACL.

[41]  Tom M. Mitchell,et al.  A Compositional and Interpretable Semantic Space , 2015, NAACL.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Massimo Poesio,et al.  Concept Learning and Categorization from the Web , 2005 .

[44]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[45]  Siddharth Patwardhan,et al.  The Role of Context Types and Dimensionality in Learning Word Embeddings , 2016, NAACL.

[46]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[47]  Eric Fosler-Lussier,et al.  Second-Order Word Embeddings from Nearest Neighbor Topological Features , 2017, ArXiv.