Word Translation Without Parallel Data

State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available.

[1]  Philipp Koehn,et al.  Learning a Translation Lexicon from Monolingual Corpora , 2002, ACL 2002.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[4]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[5]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[6]  Kevin Knight,et al.  Deciphering Foreign Language , 2011, ACL.

[7]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[8]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[9]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[10]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[13]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[14]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[15]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[16]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[17]  Shinji Umeyama,et al.  An Eigendecomposition Approach to Weighted Graph Matching Problems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Ashish Vaswani,et al.  Unifying Bayesian Inference and Vector Space Models for Improved Decipherment , 2015, ACL.

[19]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[20]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[21]  Chris Callison-Burch,et al.  Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals , 2013, NAACL.

[22]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[23]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[24]  Georgiana Dinu,et al.  Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning , 2015, ACL.

[25]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Marie-Francine Moens,et al.  Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction , 2015, ACL.

[27]  Grzegorz Kondrak,et al.  Bootstrapping Unsupervised Bilingual Lexicon Induction , 2017, EACL.

[28]  Hiroshi Kanayama,et al.  Learning Crosslingual Word Embeddings without Bilingual Corpora , 2016, EMNLP.

[29]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[30]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[31]  David Yarowsky,et al.  Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages , 2002, CoNLL.

[32]  Meng Zhang,et al.  Adversarial Training for Unsupervised Bilingual Lexicon Induction , 2017, ACL.

[33]  Eneko Agirre,et al.  Learning principled bilingual mappings of word embeddings while preserving monolingual invariance , 2016, EMNLP.

[34]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[37]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[38]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[39]  Meng Zhang,et al.  Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction , 2017, EMNLP.

[40]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[41]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[42]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[43]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[44]  Nigel Collier,et al.  SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity , 2017, *SEMEVAL.

[45]  Tiejun Zhao,et al.  A Distribution-based Model to Learn Bilingual Word Embeddings , 2016, COLING.

[46]  Kevin Knight,et al.  Deciphering Related Languages , 2017, EMNLP.

[47]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[48]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[49]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[50]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[51]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.