论文信息 - On Learning Universal Representations Across Languages

On Learning Universal Representations Across Languages

Recent studies have demonstrated the overwhelming advantage of cross-lingual pre-trained models (PTMs), such as multilingual BERT and XLM, on cross-lingual NLP tasks. However, existing approaches essentially capture the co-occurrence among tokens through involving the masked language model (MLM) objective with token-level cross entropy. In this work, we extend these approaches to learn sentence-level representations, and show the effectiveness on cross-lingual understanding and generation. We propose Hierarchical Contrastive Learning (HiCTL) to (1) learn universal representations for parallel sentences distributed in one or multiple languages and (2) distinguish the semantically-related words from a shared cross-lingual vocabulary for each sentence. We conduct evaluations on three benchmarks: language understanding tasks (QQP, QNLI, SST-2, MRPC, STS-B and MNLI) in the GLUE benchmark, cross-lingual natural language inference (XNLI) and machine translation. Experimental results show that the HiCTL obtains an absolute gain of 1.0%/2.2% accuracy on GLUE/XNLI as well as achieves substantial improvements of +1.7-+3.6 BLEU on both the high-resource and low-resource English-to-X translation tasks over strong baselines. We will release the source codes as soon as possible.

[1] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[2] Zhe Gan,et al. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding , 2021, AAAI.

[3] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[4] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[5] Sergey Edunov,et al. Pre-trained language model representations for language generation , 2019, NAACL.

[6] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[7] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[8] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[9] Pengfei Liu,et al. Extractive Summarization as Text Matching , 2020, ACL.

[10] Shih-Fu Chang,et al. Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.

[12] Ming Zhou,et al. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.

[13] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[14] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[15] Pierre Zweigenbaum,et al. Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora , 2017, BUCC@ACL.

[16] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[17] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[19] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20] Weihua Luo,et al. Multiscale Collaborative Deep Models for Neural Machine Translation , 2020, ACL.

[21] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[22] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[23] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[24] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Pushpak Bhattacharyya,et al. The IIT Bombay English-Hindi Parallel Corpus , 2017, LREC.

[27] Rico Sennrich,et al. Deep architectures for Neural Machine Translation , 2017, WMT.

[28] Philipp Koehn,et al. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[29] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[30] Shujian Huang,et al. Acquiring Knowledge from Pre-trained Model to Neural Machine Translation , 2019, AAAI.

[31] Jason Baldridge,et al. PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[32] Heng Ji,et al. Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[33] Andreas Eisele,et al. MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[34] Tie-Yan Liu,et al. Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[35] Ming Zhou,et al. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[36] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[37] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[38] Mikhail Khodak,et al. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[39] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[40] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[41] Jianfeng Gao,et al. Very Deep Transformers for Neural Machine Translation , 2020, ArXiv.

[42] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[43] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[44] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[45] Sebastian Riedel,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[46] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[47] Noah A. Smith,et al. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.

[48] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[49] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[50] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[51] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[52] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.

[53] Luo Si,et al. VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation , 2020, ArXiv.

[54] Andrej Risteski,et al. On Learning Language-Invariant Representations for Universal Machine Translation , 2020, ICML.

[55] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[56] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[57] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[58] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[59] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[60] Jiwen Lu,et al. Hardness-Aware Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Yuji Matsumoto,et al. Universal Dependencies 2.1 , 2017 .

[62] Marcin Junczys-Dowmunt,et al. From Research to Production and Back: Ludicrously Fast Neural Machine Translation , 2019, EMNLP.