A Closer Look at Few-Shot Crosslingual Transfer: Variance, Benchmarks and Baselines

We present a focused study of few-shot crosslingual transfer, a recently proposed NLP scenario: a pretrained multilingual encoder is first finetuned on many annotations in a high resource language (typically English), and then finetuned on a few annotations (the “few shots”) in a target language. Few-shot transfer brings large improvements over zeroshot transfer. However, we show that it inherently has large variance and it is necessary to report results on multiple sets of few shots for stable results and to guarantee fair comparison of different algorithms. To address this problem, we publish our few-shot sets. In a study of why few-shot learning outperforms zero-shot transfer, we show that large models heavily rely on lexical hints when finetuned on a few shots and then overfit quickly. We evaluate different methods that use few-shot annotations, but do not observe significant improvements over the baseline. This calls for better ways of utilizing the few-shot annotations.

[1]  Fei-FeiLi,et al.  One-Shot Learning of Object Categories , 2006 .

[2]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[3]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[4]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[5]  Yue Wang,et al.  Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[6]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[7]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[8]  Jason Baldridge,et al.  PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[9]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[10]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[11]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[12]  Dan Klein,et al.  Multilingual Alignment of Contextual Word Representations , 2020, ICLR.

[13]  Goran Glavaš,et al.  From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers , 2020, EMNLP.

[14]  Wanxiang Che,et al.  FewJoint: A Few-shot Learning Benchmark for Joint Language Understanding , 2020, ArXiv.

[15]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[16]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[17]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[18]  Han He,et al.  Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT , 2019, FLAIRS.

[19]  Rico Sennrich,et al.  Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation , 2020, WMT.

[20]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[21]  Trevor Cohn,et al.  Massively Multilingual Transfer for NER , 2019, ACL.

[22]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[23]  Noah A. Smith,et al.  The Multilingual Amazon Reviews Corpus , 2020, EMNLP.

[24]  Jaime G. Carbonell,et al.  A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers , 2019, EMNLP.

[25]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[26]  Stefano Soatto,et al.  A Baseline for Few-Shot Image Classification , 2019, ICLR.

[27]  Gerard de Melo,et al.  A Robust Self-Learning Framework for Cross-Lingual Text Classification , 2019, EMNLP.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[30]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[31]  Mirella Lapata,et al.  Bootstrapping a Crosslingual Semantic Parser , 2020, FINDINGS.

[32]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[33]  Yichao Lu,et al.  Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings , 2020, EMNLP.

[34]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[35]  Holger Schwenk,et al.  A Corpus for Multilingual Document Classification in Eight Languages , 2018, LREC.

[36]  Sampo Pyysalo,et al.  Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , 2020, LREC.

[37]  Lei Yu,et al.  Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.

[38]  Alan Ritter,et al.  Model Selection for Cross-Lingual Transfer using a Learned Scoring Function , 2020, ArXiv.

[39]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[40]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Dietrich Klakow,et al.  Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages , 2020, EMNLP.

[43]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[44]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[45]  Eneko Agirre,et al.  A Call for More Rigor in Unsupervised Cross-lingual Learning , 2020, ACL.

[46]  Orith Toledo-Ronen,et al.  Multilingual Argument Mining: Datasets and Analysis , 2020, FINDINGS.

[47]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[48]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[49]  Yejin Choi,et al.  Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.

[50]  Trevor Darrell,et al.  Frustratingly Simple Few-Shot Object Detection , 2020, ICML.

[51]  Ali Farhadi,et al.  Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.

[52]  Sebastian Ruder,et al.  MultiFiT: Efficient Multi-lingual Language Model Fine-tuning , 2019, EMNLP/IJCNLP.

[53]  Zhihan Zhou,et al.  Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network , 2020, ACL.

[54]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .