Lightweight Cross-Lingual Sentence Representation Learning

Large-scale models for learning fixeddimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks. However, further increases and modifications based on such large-scale models are usually impractical due to memory limitations. In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We explore different training tasks and observe that current cross-lingual training tasks leave a lot to be desired for this shallow architecture. To ameliorate this, we propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task. We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which compensates for the learning bottleneck of the lightweight transformer for generative tasks. Our comparisons with competing models on cross-lingual sentence retrieval and multilingual document classification confirm the effectiveness of the newly proposed training tasks for a shallow model. 1

[1]  Ming Zhou,et al.  Explicit Cross-lingual Pre-training for Unsupervised Machine Translation , 2019, EMNLP.

[2]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[3]  Wei Li,et al.  Transformer based Multilingual document Embedding model , 2020, ArXiv.

[4]  Guillaume Wenzek,et al.  Trans-gram, Fast Cross-lingual Word-embeddings , 2015, EMNLP.

[5]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[6]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[9]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[10]  Qun Liu,et al.  TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.

[11]  Naveen Arivazhagan,et al.  Language-agnostic BERT Sentence Embedding , 2020, ArXiv.

[12]  Kenneth Heafield,et al.  ParaCrawl: Web-Scale Acquisition of Parallel Corpora , 2020, ACL.

[13]  Holger Schwenk,et al.  A Corpus for Multilingual Document Classification in Eight Languages , 2018, LREC.

[14]  Haoran Li,et al.  Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification , 2018, Rep4NLP@ACL.

[15]  Hermann Ney,et al.  Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron , 2019, RepL4NLP@ACL.

[16]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[17]  Weihua Luo,et al.  On Learning Universal Representations Across Languages , 2021, ICLR.

[18]  Yiming Yang,et al.  MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.

[19]  Gary D. Bader,et al.  DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations , 2020, ACL.

[20]  Ray Kurzweil,et al.  Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax , 2019, IJCAI.

[21]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[22]  Josef van Genabith,et al.  An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification , 2017, IEEE Journal of Selected Topics in Signal Processing.

[23]  Holger Schwenk,et al.  Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , 2018, ACL.

[24]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[25]  Veselin Stoyanov,et al.  Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[26]  Sebastian Ruder,et al.  MultiFiT: Efficient Multi-lingual Language Model Fine-tuning , 2019, EMNLP/IJCNLP.

[27]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[28]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[29]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Ray Kurzweil,et al.  Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model , 2019, RepL4NLP@ACL.

[32]  Iryna Gurevych,et al.  Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.

[33]  Matthijs Douze,et al.  Learning Joint Multilingual Sentence Representations with Neural Machine Translation , 2017, Rep4NLP@ACL.

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[36]  Keith Stevens,et al.  Effective Parallel Corpus Mining using Bilingual Sentence Embeddings , 2018, WMT.

[37]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[38]  Martin Jaggi,et al.  Robust Cross-lingual Embeddings from Parallel Sentences , 2019, ArXiv.