Do Explicit Alignments Robustly Improve Massively Multilingual Encoders?

Multilingual BERT (mBERT), XLM-RoBERTa (XLMR) and other unsupervised multilingual encoders can effectively learn cross-lingual representation. Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations. However, word-level alignments are often suboptimal and such bitexts are unavailable for many languages. In this paper, we propose a new contrastive alignment objective that can better utilize such signal, and examine whether these previous alignment methods can be adapted to noisier sources of aligned data: a randomly sampled 1 million pair subset of the OPUS collection. Additionally, rather than report results on a single dataset with a single model run, we report the mean and standard derivation of multiple runs with different seeds, on four datasets and tasks. Our more extensive analysis finds that, while our new objective outperforms previous work, overall these methods do not improve performance with a more robust evaluation framework. Furthermore, the gains from using a better underlying model eclipse any benefits from alignment training. These negative results dictate more care in evaluating these methods and suggest limitations in applying explicit alignment objectives.

[1]  Yijia Liu,et al.  Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing , 2019, EMNLP.

[2]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[3]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[4]  Yichao Lu,et al.  On the Evaluation of Contextual Embeddings for Zero-Shot Cross-Lingual Transfer Learning , 2020, EMNLP.

[5]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[6]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[7]  Min Zhang,et al.  Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation , 2019, AAAI.

[8]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[9]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[10]  Rico Sennrich,et al.  Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[11]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[12]  Veselin Stoyanov,et al.  Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[13]  Dan Klein,et al.  Multilingual Alignment of Contextual Word Representations , 2020, ICLR.

[14]  Jörg Tiedemann,et al.  Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus , 2014, LREC.

[15]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[16]  Yiming Yang,et al.  Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework , 2020, ICLR.

[17]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[18]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[19]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[20]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Benjamin Van Durme,et al.  Which *BERT? A Survey Organizing Contextualized Encoders , 2020, EMNLP.

[24]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[25]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[26]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Pushpak Bhattacharyya,et al.  The IIT Bombay English-Hindi Parallel Corpus , 2017, LREC.

[29]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[30]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[31]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[32]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[33]  Iryna Gurevych,et al.  MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[34]  Jörg Tiedemann,et al.  OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora , 2018, LREC.

[35]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[37]  Qianchu Liu,et al.  Investigating Cross-Lingual Alignment Methods for Contextualized Embeddings with Token-Level Evaluation , 2019, CoNLL.

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[40]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Yichao Lu,et al.  Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings , 2020, EMNLP.