Syntactic Data Augmentation Increases Robustness to Inference Heuristics

Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets. We hypothesize that this issue is not primarily caused by the pretrained model's limitations, but rather by the paucity of crowdsourced NLI examples that might convey the importance of syntactic structure at the fine-tuning stage. We explore several methods to augment standard training sets with syntactically informative examples, generated by applying syntactic transformations to sentences from the MNLI corpus. The best-performing augmentation method, subject/object inversion, improved BERT's accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, without affecting performance on the MNLI test set. This improvement generalized beyond the particular construction used for data augmentation, suggesting that augmentation causes BERT to recruit abstract syntactic representations.

[1]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[2]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[3]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[4]  Haohan Wang,et al.  Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.

[5]  Pasquale Minervini,et al.  Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge , 2018, CoNLL.

[6]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[7]  Timothy J. Hazen,et al.  Robust Natural Language Inference Models with Example Forgetting , 2019, ArXiv.

[8]  Asim Kadav,et al.  Teaching Syntax by Adversarial Distraction , 2018, ArXiv.

[9]  Lei Yu,et al.  Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.

[10]  Noah D. Goodman,et al.  Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Roy Schwartz,et al.  Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets , 2019, NAACL.

[13]  Mohit Bansal,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[14]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[15]  Johan Bos,et al.  HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning , 2019, *SEMEVAL.

[16]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[17]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[18]  James Henderson,et al.  Simple but effective techniques to reduce biases , 2019, ArXiv.

[19]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[20]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[21]  R. Thomas McCoy,et al.  BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2019, BLACKBOXNLP.

[22]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[23]  Luke Zettlemoyer,et al.  Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.

[24]  Noah A. Smith,et al.  Improving Natural Language Inference with a Pretrained Parser , 2019, ArXiv.

[25]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[26]  Jianfeng Gao,et al.  HUBERT Untangles BERT to Improve Transfer across NLP Tasks , 2019, ArXiv.

[27]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[28]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[29]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.