论文信息 - Enhancing Cross-lingual Prompting with Dual Prompt Augmentation - 字舞流文

Enhancing Cross-lingual Prompting with Dual Prompt Augmentation

Prompting shows promising results in few-shot scenarios. However, its strength for multilingual/cross-lingual problems has not been fully exploited. Zhao and Sch\"utze (2021) made initial explorations in this direction by presenting that cross-lingual prompting outperforms cross-lingual finetuning. In this paper, we conduct an empirical exploration on the effect of each component in cross-lingual prompting and derive language-agnostic Universal Prompting, which helps alleviate the discrepancies between source-language training and target-language inference. Based on this, we propose DPA, a dual prompt augmentation framework, aiming at relieving the data scarcity issue in few-shot cross-lingual prompting. Notably, for XNLI, our method achieves 46.54% with only 16 English training examples per class, significantly better than 34.99% of finetuning. Our code is available at https://github.com/DAMO-NLP-SG/DPA.

Meng Zhou | Lidong Bing | Yuechun Jiang | Xin Li

[1] Wai Lam,et al. mPMR: A Multilingual Pre-trained Machine Reader at Scale , 2023, ACL.

[2] E. Cambria,et al. Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning , 2023, ArXiv.

[3] E. Cambria,et al. ConNER: Consistency Training for Cross-lingual Named Entity Recognition , 2022, EMNLP.

[4] Hinrich Schutze,et al. Discrete and Soft Prompting for Multilingual Models , 2021, EMNLP.

[5] Zhilin Yang,et al. FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning , 2021, ACL.

[6] Fabio Petroni,et al. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models , 2021, FINDINGS.

[7] Douwe Kiela,et al. True Few-Shot Learning with Language Models , 2021, NeurIPS.

[8] Alexander M. Rush,et al. How many data points is a prompt worth? , 2021, NAACL.

[9] Wancong Zhang,et al. MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture , 2021, ArXiv.

[10] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[11] Anna Korhonen,et al. A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters , 2020, ACL.

[12] Graham Neubig,et al. How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[13] Ramit Sawhney,et al. Augmenting NLP models using Latent Feature Interpolations , 2020, COLING.

[14] H. Ng,et al. Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training , 2020, EMNLP.

[15] Hwee Tou Ng,et al. Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model , 2020, IJCAI.

[16] Kilian Q. Weinberger,et al. Revisiting Few-sample BERT Fine-tuning , 2020, ICLR.

[17] Maksym Andriushchenko,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ICLR.

[18] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[19] Y. Lu,et al. Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings , 2020, EMNLP.

[20] Diyi Yang,et al. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification , 2020, ACL.

[21] Timo Schick,et al. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[22] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[23] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[24] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[25] Hongyu Guo,et al. Augmenting Data with Mixup for Sentence Classification: An Empirical Study , 2019, ArXiv.

[26] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[27] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[28] Meng Zhou,et al. From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader , 2022, ArXiv.

[29] Xi Victoria Lin,et al. Few-shot Learning with Multilingual Generative Language Models , 2022, EMNLP.

[30] Jianfeng Du,et al. Enhancing Cross-lingual Natural Language Inference by Prompt-learning from Cross-lingual Templates , 2022, ACL.

[31] Chunyan Miao,et al. MulDA: A Multilingual Data Augmentation Framework for Low-Resource Cross-Lingual NER , 2021, ACL.

[32] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .