Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).

[1]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[2]  Sebastian Ruder,et al.  A survey of cross-lingual embedding models , 2017, ArXiv.

[3]  Hervé Jégou,et al.  Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[4]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[5]  Goran Glavas,et al.  How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[6]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[7]  Heinz H. Bauschke,et al.  On the convergence of von Neumann's alternating projection algorithm for two sets , 1993 .

[8]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[9]  Regina Barzilay,et al.  Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings , 2016, NAACL.

[10]  Eneko Agirre,et al.  Learning principled bilingual mappings of word embeddings while preserving monolingual invariance , 2016, EMNLP.

[11]  Tommi S. Jaakkola,et al.  Gromov-Wasserstein Alignment of Word Embedding Spaces , 2018, EMNLP.

[12]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[13]  Lior Wolf,et al.  Non-Adversarial Unsupervised Word Translation , 2018, EMNLP.

[14]  Boris Polyak,et al.  The method of projections for finding the common point of convex sets , 1967 .

[15]  Ryan Cotterell,et al.  A Discriminative Latent-Variable Model for Bilingual Lexicon Induction , 2018, EMNLP.

[16]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[17]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[18]  Michael J. Paul,et al.  A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity , 2019, ACL.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Bamdev Mishra,et al.  Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach , 2018, TACL.

[21]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[22]  David M. W. Powers,et al.  Verb similarity on the taxonomy of WordNet , 2006 .

[23]  Steven Schockaert,et al.  Improving Cross-Lingual Word Embeddings by Meeting in the Middle , 2018, EMNLP.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[26]  Tommi S. Jaakkola,et al.  Towards Optimal Transport with Global Invariances , 2018, AISTATS.

[27]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[28]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[29]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[30]  David Yarowsky,et al.  Cross-lingual Dependency Parsing Based on Distributed Representations , 2015, ACL.

[31]  Xiao Li,et al.  Convergence Analysis of Alternating Projection Method for Nonconvex Sets , 2018, 1802.03889.

[32]  Felix E. Browder,et al.  Convergence of approximants to fixed points of nonexpansive nonlinear mappings in banach spaces , 1967 .

[33]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[34]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[35]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..