End-to-End Self-Debiasing Framework for Robust NLU Training

Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones. We introduce a simple yet effective debiasing framework whereby the shallow representations of the main model are used to derive a bias model and both models are trained simultaneously. We demonstrate on three well studied NLU tasks that despite its simplicity, our method leads to competitive OOD results. It significantly outperforms other debiasing approaches on two tasks, while still delivering high in-distribution performance.

[1]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[2]  Yonatan Belinkov,et al.  End-to-End Bias Mitigation by Modelling Biases in Corpora , 2020, ACL.

[3]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[4]  Regina Barzilay,et al.  Towards Debiasing Fact Verification Models , 2019, EMNLP.

[5]  Haohan Wang,et al.  Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.

[6]  Dirk Hovy,et al.  Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview , 2019, ACL.

[7]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[8]  Roy Schwartz,et al.  The Right Tool for the Job: Matching Model and Instance Complexities , 2020, ACL.

[9]  Iryna Gurevych,et al.  Towards Debiasing NLU Models from Unknown Biases , 2020, EMNLP.

[10]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[11]  Iryna Gurevych,et al.  Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance , 2020, ACL.

[12]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[13]  Philippe Langlais,et al.  Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition , 2021, Transactions of the Association for Computational Linguistics.

[14]  Abbas Ghaddar,et al.  WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition , 2017, IJCNLP.

[15]  Yonatan Belinkov,et al.  Learning from others' mistakes: Avoiding dataset biases without modeling them , 2020, ICLR.

[16]  Yonatan Belinkov,et al.  Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects , 2019, Proceedings of the Second Workshop on Shortcomings in Vision and Language.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Jimmy J. Lin,et al.  BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression , 2021, EACL.

[19]  Jason Baldridge,et al.  PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[22]  Ming-Wei Chang,et al.  Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .

[23]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[26]  Iryna Gurevych,et al.  Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures , 2020, ArXiv.

[27]  Furu Wei,et al.  BERT Loses Patience: Fast and Robust Inference with Early Exit , 2020, NeurIPS.

[28]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[29]  Noah D. Goodman,et al.  Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[30]  Luke Zettlemoyer,et al.  Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Luke Zettlemoyer,et al.  Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles , 2020, FINDINGS.

[33]  Dhruv Batra,et al.  Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.