论文信息 - Lightweight Cross-Lingual Sentence Representation Learning - 字舞流文

Lightweight Cross-Lingual Sentence Representation Learning

Large-scale models for learning fixeddimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks. However, further increases and modifications based on such large-scale models are usually impractical due to memory limitations. In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We explore different training tasks and observe that current cross-lingual training tasks leave a lot to be desired for this shallow architecture. To ameliorate this, we propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task. We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which compensates for the learning bottleneck of the lightweight transformer for generative tasks. Our comparisons with competing models on cross-lingual sentence retrieval and multilingual document classification confirm the effectiveness of the newly proposed training tasks for a shallow model. 1

Martin Jaggi | Chenhui Chu | Sadao Kurohashi | Zhuoyuan Mao | Prakhar Gupta

[1] Ming Zhou,et al. Explicit Cross-lingual Pre-training for Unsupervised Machine Translation , 2019, EMNLP.

[2] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[3] Wei Li,et al. Transformer based Multilingual document Embedding model , 2020, ArXiv.

[4] Guillaume Wenzek,et al. Trans-gram, Fast Cross-lingual Word-embeddings , 2015, EMNLP.

[5] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[6] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[8] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[9] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[10] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.

[11] Naveen Arivazhagan,et al. Language-agnostic BERT Sentence Embedding , 2020, ArXiv.

[12] Kenneth Heafield,et al. ParaCrawl: Web-Scale Acquisition of Parallel Corpora , 2020, ACL.

[13] Holger Schwenk,et al. A Corpus for Multilingual Document Classification in Eight Languages , 2018, LREC.

[14] Haoran Li,et al. Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification , 2018, Rep4NLP@ACL.

[15] Hermann Ney,et al. Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron , 2019, RepL4NLP@ACL.

[16] Christopher D. Manning,et al. Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[17] Weihua Luo,et al. On Learning Universal Representations Across Languages , 2021, ICLR.

[18] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.

[19] Gary D. Bader,et al. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations , 2020, ACL.

[20] Ray Kurzweil,et al. Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax , 2019, IJCAI.

[21] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[22] Josef van Genabith,et al. An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification , 2017, IEEE Journal of Selected Topics in Signal Processing.

[23] Holger Schwenk,et al. Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , 2018, ACL.

[24] Zhen Wang,et al. Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[25] Veselin Stoyanov,et al. Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[26] Sebastian Ruder,et al. MultiFiT: Efficient Multi-lingual Language Model Fine-tuning , 2019, EMNLP/IJCNLP.

[27] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[28] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[29] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31] Ray Kurzweil,et al. Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model , 2019, RepL4NLP@ACL.

[32] Iryna Gurevych,et al. Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.

[33] Matthijs Douze,et al. Learning Joint Multilingual Sentence Representations with Neural Machine Translation , 2017, Rep4NLP@ACL.

[34] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[36] Keith Stevens,et al. Effective Parallel Corpus Mining using Bilingual Sentence Embeddings , 2018, WMT.

[37] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[38] Martin Jaggi,et al. Robust Cross-lingual Embeddings from Parallel Sentences , 2019, ArXiv.