Why does PairDiff work? - A Mathematical Analysis of Bilinear Relational Compositional Operators for Analogy Detection

Representing the semantic relations that exist between two given words (or entities) is an important first step in a wide-range of NLP applications such as analogical reasoning, knowledge base completion and relational information retrieval. A simple, yet surprisingly accurate method for representing a relation between two words is to compute the vector offset (\PairDiff) between their corresponding word embeddings. Despite the empirical success, it remains unclear as to whether \PairDiff is the best operator for obtaining a relational representation from word embeddings. We conduct a theoretical analysis of generalised bilinear operators that can be used to measure the $\ell_{2}$ relational distance between two word-pairs. We show that, if the word embeddings are standardised and uncorrelated, such an operator will be independent of bilinear terms, and can be simplified to a linear form, where \PairDiff is a special case. For numerous word embedding types, we empirically verify the uncorrelation assumption, demonstrating the general applicability of our theoretical result. Moreover, we experimentally discover \PairDiff from the bilinear relation composition operator on several benchmark analogy datasets.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[4]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[5]  Ekaterina Vylomova,et al.  Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning , 2015, ACL.

[6]  Yulia Tsvetkov,et al.  Sparse Overcomplete Word Vector Representations , 2015, ACL.

[7]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[8]  Masashi Shimbo,et al.  On the Equivalence of Holographic and Complex Embeddings for Link Prediction , 2017, ACL.

[9]  Ken-ichi Kawarabayashi,et al.  Learning Word Representations from Relational Graphs , 2014, AAAI.

[10]  Pramod Viswanath,et al.  All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.

[11]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[12]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[13]  Saif Mohammad,et al.  SemEval-2012 Task 2: Measuring Degrees of Relational Similarity , 2012, *SEMEVAL.

[14]  Danushka Bollegala,et al.  A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web , 2009, EMNLP.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Satoshi Matsuoka,et al.  Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen , 2016, COLING.

[18]  Thomas Demeester,et al.  Adversarial Sets for Regularising Neural Link Predictors , 2017, UAI.

[19]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[20]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[21]  Danushka Bollegala,et al.  Compositional approaches for representing relations between words: A comparative study , 2017, Knowl. Based Syst..

[22]  Danushka Bollegala,et al.  Using Relational Similarity between Word Pairs for Latent Relational Search on the Web , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[23]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[24]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[25]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[26]  Noah A. Smith,et al.  Learning Word Representations with Hierarchical Sparse Coding , 2014, ICML.

[27]  Satoshi Matsuoka,et al.  Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. , 2016, NAACL.

[28]  Wenpeng Yin,et al.  Learning Meta-Embeddings by Using Ensembles of Embedding Sets , 2015, 1508.04257.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[31]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[32]  Michael L. Littman,et al.  Corpus-based Learning of Analogies and Semantic Relations , 2005, Machine Learning.

[33]  Ken-ichi Kawarabayashi,et al.  Embedding Semantic Relations into Word Representations , 2015, IJCAI.

[34]  Mirella Lapata,et al.  A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[35]  Wenpeng Yin,et al.  Learning Word Meta-Embeddings by Using Ensembles of Embedding Sets , 2015, ArXiv.

[36]  Li Guo,et al.  Jointly Embedding Knowledge Graphs and Logical Rules , 2016, EMNLP.

[37]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.