Combining Static Word Embeddings and Contextual Representations for Bilingual Lexicon Induction

Bilingual Lexicon Induction (BLI) aims to map words in one language to their translations in another, and are typically through learning linear projections to align monolingual word representation spaces. Two classes of word representations have been explored for BLI: static word embeddings and contextual representations, but there is no studies to combine both. In this paper, we propose a simple yet effective mechanism to combine the static word embeddings and the contextual representations to utilize the advantages of both paradigms. We test the combination mechanism on various language pairs under the supervised and unsupervised BLI benchmark settings. Experiments show that our mechanism consistently improves performances over robust BLI baselines on all language pairs by averagely improving 3.2 points in the supervised setting, and 3.1 points in the unsupervised setting1.

[1]  Eneko Agirre,et al.  Learning principled bilingual mappings of word embeddings while preserving monolingual invariance , 2016, EMNLP.

[2]  Min Xiao,et al.  Distributed Word Representation Learning for Cross-Lingual Dependency Parsing , 2014, CoNLL.

[3]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[4]  Tasnim Mohiuddin,et al.  LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space , 2020, EMNLP.

[5]  Meng Zhang,et al.  Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction , 2017, EMNLP.

[6]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[7]  Kevin Knight,et al.  Deciphering Foreign Language , 2011, ACL.

[8]  Yiming Yang,et al.  Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework , 2020, ICLR.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Hervé Jégou,et al.  Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[13]  Eneko Agirre,et al.  Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations , 2018, AAAI.

[14]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[15]  Marie-Francine Moens,et al.  A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else) , 2013, EMNLP.

[16]  Graham Neubig,et al.  Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces , 2019, ACL.

[17]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[18]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[19]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[20]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[21]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[22]  Sida I. Wang,et al.  Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment , 2021, ACL/IJCNLP.

[23]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[24]  Chris Callison-Burch,et al.  Combining Bilingual and Comparable Corpora for Low Resource Machine Translation , 2013, WMT@ACL.

[25]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[26]  José Luis Redondo García,et al.  Cross-lingual Alignment Methods for Multilingual BERT: A Comparative Study , 2020, FINDINGS.

[27]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[28]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[29]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[30]  Shafiq R. Joty,et al.  Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training , 2019, NAACL.

[31]  Graham Neubig,et al.  When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.

[32]  Meng Zhang,et al.  Adversarial Training for Unsupervised Bilingual Lexicon Induction , 2017, ACL.

[33]  Dan Klein,et al.  Multilingual Alignment of Contextual Word Representations , 2020, ICLR.

[34]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[35]  Regina Barzilay,et al.  Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings , 2016, NAACL.

[36]  Eneko Agirre,et al.  Bilingual Lexicon Induction through Unsupervised Machine Translation , 2019, ACL.

[37]  Mona T. Diab,et al.  Context-Aware Cross-Lingual Mapping , 2019, NAACL.

[38]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.