Meta-learning for Few-shot Natural Language Processing: A Survey

Few-shot natural language processing (NLP) refers to NLP tasks that are accompanied with merely a handful of labeled examples. This is a real-world challenge that an AI system must learn to handle. Usually we rely on collecting more auxiliary information or developing a more efficient learning algorithm. However, the general gradient-based optimization in high capacity models, if training from scratch, requires many parameter-updating steps over a large number of labeled examples to perform well (Snell et al., 2017). If the target task itself cannot provide more information, how about collecting more tasks equipped with rich annotations to help the model learning? The goal of meta-learning is to train a model on a variety of tasks with rich annotations, such that it can solve a new task using only a few labeled samples. The key idea is to train the model's initial parameters such that the model has maximal performance on a new task after the parameters have been updated through zero or a couple of gradient steps. There are already some surveys for meta-learning, such as (Vilalta and Drissi, 2002; Vanschoren, 2018; Hospedales et al., 2020). Nevertheless, this paper focuses on NLP domain, especially few-shot applications. We try to provide clearer definitions, progress summary and some common datasets of applying meta-learning to few-shot NLP.

[1]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[2]  Jian Sun,et al.  Induction Networks for Few-Shot Text Classification , 2019, EMNLP/IJCNLP.

[3]  Zhiyuan Liu,et al.  Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification , 2019, AAAI.

[4]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[5]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[8]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[9]  Huajun Chen,et al.  Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection , 2020, WSDM.

[10]  Zi-Yi Dou,et al.  Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks , 2019, EMNLP.

[11]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[12]  Andrew McCallum,et al.  Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks , 2020, COLING.

[13]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[14]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[15]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[16]  Zhiyuan Liu,et al.  FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation , 2018, EMNLP.

[17]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[18]  Lingjia Tang,et al.  An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , 2019, EMNLP.

[19]  Stan Matwin,et al.  Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification , 2018 .

[20]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[21]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[22]  Philip S. Yu,et al.  CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection , 2020, ArXiv.

[23]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[24]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[25]  Shengli Sun,et al.  Hierarchical Attention Prototypical Networks for Few-Shot Text Classification , 2019, EMNLP.

[26]  Yu Cheng,et al.  Diverse Few-Shot Text Classification with Multiple Metrics , 2018, NAACL.

[27]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[29]  Franck Dernoncourt,et al.  Exploiting the Matching Information in the Support Set for Few Shot Event Classification , 2020, PAKDD.

[30]  Maosong Sun,et al.  FewRel 2.0: Towards More Challenging Few-Shot Relation Classification , 2019, EMNLP.

[31]  Helen Yannakoudakis,et al.  Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation , 2020, EMNLP.

[32]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[33]  Jun Zhao,et al.  Knowledge Guided Metric Learning for Few-Shot Text Classification , 2020, NAACL.