Analogy Training Multilingual Encoders

Language encoders encode words and phrases in ways that capture their local semantic relatedness, but are known to be globally inconsistent. Global inconsistency can seemingly be corrected for, in part, by leveraging signals from knowledge bases, but previous results are partial and limited to monolingual English encoders. We extract a large-scale multilingual, multi-word analogy dataset from Wikidata for diagnosing and correcting for global inconsistencies and implement a four-way Siamese BERT architecture for grounding multilingual BERT (mBERT) in Wikidata through analogy training. We show that analogy training not only improves the global consistency of mBERT, as well as the isomorphism of language-specific subspaces, but also leads to significant gains on downstream tasks such as bilingual dictionary induction and sentence retrieval.

[1]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[2]  Graham Neubig,et al.  Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces , 2019, ACL.

[3]  Veselin Stoyanov,et al.  Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[4]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[5]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[6]  Ndapandula Nakashole,et al.  Characterizing Departures from Linearity in Word Translation , 2018, ACL.

[7]  Thierry Poibeau,et al.  Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing , 2018, Computational Linguistics.

[8]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[9]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[10]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[11]  Bofang Li,et al.  The (too Many) Problems of Analogical Reasoning with Word Vectors , 2017, *SEMEVAL.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[14]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[15]  Anna Korhonen,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017, TACL.

[16]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[17]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[18]  Matthew Henderson,et al.  ConveRT: Efficient and Accurate Conversational Representations from Transformers , 2020, EMNLP.

[19]  Goran Glavas,et al.  Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources , 2018, NAACL.

[20]  Ken-ichi Kawarabayashi,et al.  Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization , 2019, ACL.

[21]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[22]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[23]  Jason Weston,et al.  Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019 .

[24]  Michael J. Paul,et al.  Why Overfitting Isn’t Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries , 2020, ACL.

[25]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[26]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[27]  Goran Glavas,et al.  How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[28]  Mona T. Diab,et al.  Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings , 2017, TACL.

[29]  Mostafa Abdou,et al.  MGAD: Multilingual Generation of Analogy Datasets , 2018, LREC.

[30]  Tal Linzen,et al.  Issues in evaluating semantic spaces using word analogies , 2016, RepEval@ACL.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Anders Søgaard,et al.  Why is unsupervised alignment of English embeddings from different algorithms so hard? , 2018, EMNLP.

[33]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[34]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[35]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[36]  Alexander Peysakhovich,et al.  PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[37]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[38]  Mark Stevenson,et al.  Revisiting the linearity in cross-lingual embedding mappings: from a perspective of word analogies , 2020, ArXiv.

[39]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[40]  Satoshi Matsuoka,et al.  Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen , 2016, COLING.

[41]  Serguei V. S. Pakhomov,et al.  What Analogies Reveal about Word Vectors and their Compositionality , 2017, *SEM.

[42]  Goran Glavas,et al.  Do We Really Need Fully Unsupervised Cross-Lingual Embeddings? , 2019, EMNLP.

[43]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[44]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[45]  Anders Søgaard,et al.  Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction , 2019, EMNLP/IJCNLP.

[46]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[47]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[48]  Eric Fosler-Lussier,et al.  Insights into Analogy Completion from the Biomedical Domain , 2017, BioNLP.

[49]  Pierre Zweigenbaum,et al.  Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora , 2017, BUCC@ACL.

[50]  Richard Socher,et al.  BERT is Not an Interlingua and the Bias of Tokenization , 2019, EMNLP.

[51]  Kawin Ethayarajh Rotate King to get Queen: Word Relationships as Orthogonal Transformations in Embedding Space , 2019, EMNLP/IJCNLP.

[52]  Jordan Boyd-Graber,et al.  Interactive Refinement of Cross-Lingual Word Embeddings , 2020, EMNLP.

[53]  Natalie Schluter,et al.  The Word Analogy Testing Caveat , 2018, NAACL.