Improving Cross-Lingual Word Embeddings by Meeting in the Middle

Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards a middle point between them. By applying this transformation our aim is to obtain a better cross-lingual integration of the vector spaces. In addition, and perhaps surprisingly, the monolingual spaces also improve by this transformation. This is in contrast to the original alignment, which is typically learned such that the structure of the monolingual spaces is preserved. Our experiments confirm that the resulting cross-lingual embeddings outperform state-of-the-art models in both monolingual and cross-lingual evaluation tasks.

[1]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[2]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[3]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[4]  Haixun Wang,et al.  Learning Term Embeddings for Hypernymy Identification , 2015, IJCAI.

[5]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[6]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[7]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[8]  Danushka Bollegala,et al.  Frustratingly Easy Meta-Embedding - Computing Meta-Embeddings by Averaging Source Word Embeddings , 2018, NAACL-HLT.

[9]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[10]  Eneko Agirre,et al.  Learning principled bilingual mappings of word embeddings while preserving monolingual invariance , 2016, EMNLP.

[11]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[12]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[13]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[14]  Paul Buitelaar,et al.  SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2) , 2016, *SEMEVAL.

[15]  Nigel Collier,et al.  SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity , 2017, *SEMEVAL.

[16]  Ryan Cotterell,et al.  Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction , 2018, CoNLL.

[17]  Wenpeng Yin,et al.  Learning Word Meta-Embeddings , 2016, ACL.

[18]  Manaal Faruqui,et al.  Cross-lingual Models of Word Embeddings: An Empirical Comparison , 2016, ACL.

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[21]  Eneko Agirre,et al.  Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations , 2018, AAAI.

[22]  Achim Rettinger,et al.  Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification , 2016, NAACL.

[23]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[24]  Horacio Saggion,et al.  SemEval-2018 Task 9: Hypernym Discovery , 2018, *SEMEVAL.

[25]  Gabriel Bernier-Colborne,et al.  CRIM at SemEval-2018 Task 9: A Hybrid Approach to Hypernym Discovery , 2018, *SEMEVAL.

[26]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[27]  Anna Korhonen,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017, TACL.

[28]  Omer Levy,et al.  A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments , 2016, EACL.

[29]  Eneko Agirre,et al.  Bilingual Embeddings with Random Walks over Multilingual Wordnets , 2018, Knowl. Based Syst..

[30]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[31]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[32]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[33]  Barbara Plank,et al.  Inverted indexing for cross-lingual NLP , 2015, ACL.

[34]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[35]  Ido Dagan,et al.  The Distributional Inclusion Hypotheses and Lexical Entailment , 2005, ACL.

[36]  Ernest Davis Does the world look different in different languages? , 2015, Artif. Intell..

[37]  Gerhard Weikum,et al.  Robust question answering over the web of linked data , 2013, CIKM.

[38]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[39]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[40]  Horacio Saggion,et al.  Supervised Distributional Hypernym Discovery via Domain Adaptation , 2016, EMNLP.

[41]  Roi Reichart,et al.  Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling , 2015 .

[42]  Marie-Francine Moens,et al.  Bilingual Distributed Word Representations from Document-Aligned Comparable Data , 2015, J. Artif. Intell. Res..

[43]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[44]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[45]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[46]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[47]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[48]  Dan Roth,et al.  Robust Cross-lingual Hypernymy Detection using Dependency Context , 2018, NAACL-HLT.

[49]  Dan Roth,et al.  Cross-lingual Wikification Using Multilingual Embeddings , 2016, NAACL.

[50]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[51]  Katrin Erk,et al.  Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment , 2016, EMNLP.

[52]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[53]  Roberto Navigli,et al.  A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets , 2015, ACL.

[54]  Gerhard Weikum,et al.  STICS: searching with strings, things, and cats , 2014, SIGIR.

[55]  Steve Young,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017 .

[56]  Dominik Schlechtweg,et al.  Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection , 2016, EACL.

[57]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[58]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[59]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[60]  Marie-Francine Moens,et al.  Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction , 2015, ACL.