DReCa: A General Task Augmentation Strategy for Few-Shot Natural Language Inference

Meta-learning promises few-shot learners that can adapt to new distributions by repurposing knowledge acquired from previous training. However, we believe meta-learning has not yet succeeded in NLP due to the lack of a well-defined task distribution, leading to attempts that treat datasets as tasks. Such an ad hoc task distribution causes problems of quantity and quality. Since there’s only a handful of datasets for any NLP problem, meta-learners tend to overfit their adaptation mechanism and, since NLP datasets are highly heterogeneous, many learning episodes have poor transfer between their support and query sets, which discourages the meta-learner from adapting. To alleviate these issues, we propose DReCA (Decomposing datasets into Reasoning Categories), a simple method for discovering and using latent reasoning categories in a dataset, to form additional high quality tasks. DReCA works by splitting examples into label groups, embedding them with a finetuned BERT model and then clustering each group into reasoning categories. Across four few-shot NLI problems, we demonstrate that using DReCA improves the accuracy of meta-learners by 1.5-4%

[1]  A. McCallum,et al.  Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks , 2019, COLING.

[2]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[3]  Percy Liang,et al.  Transforming Question Answering Datasets Into Natural Language Inference Datasets , 2018, ArXiv.

[4]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[5]  Sergey Levine,et al.  Unsupervised Learning via Meta-Learning , 2018, ICLR.

[6]  Yu Cheng,et al.  Diverse Few-Shot Text Classification with Multiple Metrics , 2018, NAACL.

[7]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[8]  Helen Yannakoudakis,et al.  Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation , 2020, FINDINGS.

[9]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[10]  Janarthanan Rajendran,et al.  Meta-Learning Requires Meta-Augmentation , 2020, NeurIPS.

[11]  Roy Bar-Haim,et al.  The Second PASCAL Recognising Textual Entailment Challenge , 2006 .

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  M. Choudhury,et al.  TaxiNLI: Taking a Ride up the NLU Hill , 2020, CONLL.

[14]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[15]  Christopher D. Manning,et al.  Modeling Semantic Containment and Exclusion in Natural Language Inference , 2008, COLING.

[16]  Roee Aharoni,et al.  Unsupervised Domain Clusters in Pretrained Language Models , 2020, ACL.

[17]  Zi-Yi Dou,et al.  Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks , 2019, EMNLP.

[18]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[19]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[20]  Zhiyuan Liu,et al.  FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation , 2018, EMNLP.

[21]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[22]  Hinrich Schutze,et al.  It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[23]  Regina Barzilay,et al.  Few-shot Text Classification with Distributional Signatures , 2019, ICLR.

[24]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[25]  Lawrence S. Moss,et al.  Probing Natural Language Inference Models through Semantic Fragments , 2020, AAAI.

[26]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[27]  Artem Molchanov,et al.  Generalized Inner Loop Meta-Learning , 2019, ArXiv.

[28]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[29]  Rachel Rudinger,et al.  Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation , 2018, BlackboxNLP@EMNLP.

[30]  Trapit Bansal,et al.  Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks , 2020, EMNLP.

[31]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[32]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[33]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[34]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[35]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[36]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.

[37]  Jian Sun,et al.  Induction Networks for Few-Shot Text Classification , 2019, EMNLP/IJCNLP.

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[39]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[40]  Chelsea Finn,et al.  Meta-Learning without Memorization , 2020, ICLR.

[41]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.