Enhancing Cross-lingual Prompting with Dual Prompt Augmentation

Prompting shows promising results in few-shot scenarios. However, its strength for multilingual/cross-lingual problems has not been fully exploited. Zhao and Sch\"utze (2021) made initial explorations in this direction by presenting that cross-lingual prompting outperforms cross-lingual finetuning. In this paper, we conduct an empirical exploration on the effect of each component in cross-lingual prompting and derive language-agnostic Universal Prompting, which helps alleviate the discrepancies between source-language training and target-language inference. Based on this, we propose DPA, a dual prompt augmentation framework, aiming at relieving the data scarcity issue in few-shot cross-lingual prompting. Notably, for XNLI, our method achieves 46.54% with only 16 English training examples per class, significantly better than 34.99% of finetuning. Our code is available at https://github.com/DAMO-NLP-SG/DPA.

[1]  Wai Lam,et al.  mPMR: A Multilingual Pre-trained Machine Reader at Scale , 2023, ACL.

[2]  E. Cambria,et al.  Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning , 2023, ArXiv.

[3]  E. Cambria,et al.  ConNER: Consistency Training for Cross-lingual Named Entity Recognition , 2022, EMNLP.

[4]  Hinrich Schutze,et al.  Discrete and Soft Prompting for Multilingual Models , 2021, EMNLP.

[5]  Zhilin Yang,et al.  FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning , 2021, ACL.

[6]  Fabio Petroni,et al.  Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models , 2021, FINDINGS.

[7]  Douwe Kiela,et al.  True Few-Shot Learning with Language Models , 2021, NeurIPS.

[8]  Alexander M. Rush,et al.  How many data points is a prompt worth? , 2021, NAACL.

[9]  Wancong Zhang,et al.  MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture , 2021, ArXiv.

[10]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[11]  Anna Korhonen,et al.  A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters , 2020, ACL.

[12]  Graham Neubig,et al.  How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[13]  Ramit Sawhney,et al.  Augmenting NLP models using Latent Feature Interpolations , 2020, COLING.

[14]  H. Ng,et al.  Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training , 2020, EMNLP.

[15]  Hwee Tou Ng,et al.  Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model , 2020, IJCAI.

[16]  Kilian Q. Weinberger,et al.  Revisiting Few-sample BERT Fine-tuning , 2020, ICLR.

[17]  Maksym Andriushchenko,et al.  On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ICLR.

[18]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[19]  Y. Lu,et al.  Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings , 2020, EMNLP.

[20]  Diyi Yang,et al.  MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification , 2020, ACL.

[21]  Timo Schick,et al.  Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[22]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[23]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[24]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[25]  Hongyu Guo,et al.  Augmenting Data with Mixup for Sentence Classification: An Empirical Study , 2019, ArXiv.

[26]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Meng Zhou,et al.  From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader , 2022, ArXiv.

[29]  Xi Victoria Lin,et al.  Few-shot Learning with Multilingual Generative Language Models , 2022, EMNLP.

[30]  Jianfeng Du,et al.  Enhancing Cross-lingual Natural Language Inference by Prompt-learning from Cross-lingual Templates , 2022, ACL.

[31]  Chunyan Miao,et al.  MulDA: A Multilingual Data Augmentation Framework for Low-Resource Cross-Lingual NER , 2021, ACL.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .