mPMR: A Multilingual Pre-trained Machine Reader at Scale

We present multilingual Pre-trained Machine Reader (mPMR), a novel method for multilingual machine reading comprehension (MRC)-style pre-training. mPMR aims to guide multilingual pre-trained language models (mPLMs) to perform natural language understanding (NLU) including both sequence classification and span extraction in multiple languages. To achieve cross-lingual generalization when only source-language fine-tuning data is available, existing mPLMs solely transfer NLU capability from a source language to target languages. In contrast, mPMR allows the direct inheritance of multilingual NLU capability from the MRC-style pre-training to downstream tasks. Therefore, mPMR acquires better NLU capability for target languages. mPMR also provides a unified solver for tackling cross-lingual span extraction and sequence classification, thereby enabling the extraction of rationales to explain the sentence-pair classification process.

[1]  E. Cambria,et al.  Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning , 2023, ArXiv.

[2]  E. Cambria,et al.  ConNER: Consistency Training for Cross-lingual Named Entity Recognition , 2022, EMNLP.

[3]  Jackie Chun-Sing Ho,et al.  Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation , 2022, EMNLP.

[4]  Xi Victoria Lin,et al.  Lifting the Curse of Multilinguality by Pre-training Modular Transformers , 2022, NAACL.

[5]  Shafiq R. Joty,et al.  Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples , 2021, EMNLP.

[6]  Yoshimasa Tsuruoka,et al.  mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models , 2021, ACL.

[7]  Shafiq R. Joty,et al.  GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems , 2021, ACL.

[8]  Lidong Bing,et al.  Multilingual AMR Parsing with Noisy Knowledge Distillation , 2021, EMNLP.

[9]  Weizhu Chen,et al.  XLM-K: Improving Cross-Lingual Language Model Pre-Training with Multilingual Knowledge , 2021, AAAI.

[10]  Alessandro Raganato,et al.  Wikipedia Entities as Rendezvous across Languages: Grounding Multilingual Language Models by Predicting Wikipedia Hyperlinks , 2021, NAACL.

[11]  Bin Bi,et al.  VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation , 2020, ACL.

[12]  Wai Lam,et al.  Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond , 2020, ArXiv.

[13]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[14]  Hwee Tou Ng,et al.  Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model , 2020, IJCAI.

[15]  Jaewoo Kang,et al.  Look at the First Sentence: Position Bias in Question Answering , 2020, EMNLP.

[16]  Orhan Firat,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[17]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[18]  Wei-Cheng Chang,et al.  Pre-training Tasks for Embedding-based Large-scale Retrieval , 2020, ICLR.

[19]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[20]  Luke Zettlemoyer,et al.  Emerging Cross-lingual Structure in Pretrained Language Models , 2019, ACL.

[21]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[22]  Holger Schwenk,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[23]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[24]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[25]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[26]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[27]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[28]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[29]  Meng Zhou,et al.  From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader , 2022, ArXiv.

[30]  Meng Zhou,et al.  Enhancing Cross-lingual Prompting with Mask Token Augmentation , 2022, ArXiv.

[31]  Gábor Berend Combating the Curse of Multilinguality in Cross-Lingual WSD by Aligning Sparse Contextualized Word Representations , 2023, NAACL.

[32]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.