Revisiting the linearity in cross-lingual embedding mappings: from a perspective of word analogies

Most cross-lingual embedding mapping algorithms assume the optimised transformation functions to be linear. Recent studies showed that on some occasions, learning a linear mapping does not work, indicating that the commonly-used assumption may fail. However, it still remains unclear under which conditions the linearity of cross-lingual embedding mappings holds. In this paper, we rigorously explain that the linearity assumption relies on the consistency of analogical relations encoded by multilingual embeddings. We did extensive experiments to validate this claim. Empirical results based on the analogy completion benchmark and the BLI task demonstrate a strong correlation between whether mappings capture analogical information and are linear.

[1]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[2]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[3]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[4]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Marco Baroni,et al.  Morph-it! A free corpus-based morphological resource for the Italian language , 2005 .

[7]  Georgiana Dinu,et al.  Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning , 2015, ACL.

[8]  Kevin Gimpel,et al.  Deep Multilingual Correlation for Improved Word Embeddings , 2015, NAACL.

[9]  Goran Glavas,et al.  Do We Really Need Fully Unsupervised Cross-Lingual Embeddings? , 2019, EMNLP.

[10]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[11]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[12]  Ndapandula Nakashole NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings , 2018, EMNLP.

[13]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[16]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[17]  Giacomo Berardi,et al.  Word Embeddings Go to Italy: A Comparison of Models and Training Datasets , 2015, IIR.

[18]  Timothy M. Hospedales,et al.  Analogies Explained: Towards Understanding Word Embeddings , 2019, ICML.

[19]  Graeme Hirst,et al.  Towards Understanding Linear Word Analogies , 2018, ACL.

[20]  Tal Linzen,et al.  Issues in evaluating semantic spaces using word analogies , 2016, RepEval@ACL.

[21]  Adam Przepiórkowski,et al.  PoliMorf: a (not so) new open morphological dictionary for Polish , 2012, LREC.

[22]  Meng Zhang,et al.  Adversarial Training for Unsupervised Bilingual Lexicon Induction , 2017, ACL.

[23]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[24]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[25]  Ken-ichi Kawarabayashi,et al.  Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization , 2019, ACL.

[26]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[27]  Peng Chen,et al.  MAAM: A Morphology-Aware Alignment Model for Unsupervised Bilingual Lexicon Induction , 2019, ACL.

[28]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[29]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[30]  Sabine Schulte im Walde,et al.  Multilingual Reliability and “Semantic” Structure of Continuous Word Spaces , 2015, IWCS.

[31]  Ganesh Ramakrishnan,et al.  Cross-Lingual Training for Automatic Question Generation , 2019, ACL.

[32]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[33]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[34]  Graham Neubig,et al.  Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces , 2019, ACL.

[35]  Nancy Ide,et al.  MULTEXT-East free lexicons 4.0 , 2010 .

[36]  Duygu Altinok DEMorphy, German Language Morphological Analyzer , 2018, ArXiv.

[37]  Kehai Chen,et al.  A Bilingual Adversarial Autoencoder for Unsupervised Bilingual Lexicon Induction , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[38]  Hung-yi Lee,et al.  Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model , 2019, EMNLP.

[39]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[40]  Satoshi Matsuoka,et al.  Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen , 2016, COLING.

[41]  Eneko Agirre,et al.  Learning principled bilingual mappings of word embeddings while preserving monolingual invariance , 2016, EMNLP.

[42]  Michael W. Mahoney,et al.  Skip-Gram − Zipf + Uniform = Vector Additivity , 2017, ACL.

[43]  Jacob Goldberger,et al.  Aligning Vector-spaces with Noisy Supervised Lexicons , 2019, NAACL-HLT.

[44]  Ndapandula Nakashole,et al.  Characterizing Departures from Linearity in Word Translation , 2018, ACL.

[45]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.