Improving QA Generalization by Concurrent Modeling of Multiple Biases

Existing NLP datasets contain various biases that models can easily exploit to achieve high performances on the corresponding evaluation sets. However, focusing on dataset-specific biases limits their ability to learn more generalizable knowledge about the task from more general data patterns. In this paper, we investigate the impact of debiasing methods for improving generalization and propose a general framework for improving the performance on both in-domain and out-of-domain datasets by concurrent modeling of multiple biases in the training data. Our framework weights each example based on the biases it contains and the strength of those biases in the training data. It then uses these weights in the training objective so that the model relies less on examples with high bias weights. We extensively evaluate our framework on extractive question answering with training data from various domains with multiple biases of different strengths. We perform the evaluations in two different settings, in which the model is trained on a single domain or multiple domains simultaneously, and show its effectiveness in both settings compared to state-of-the-art debiasing methods.

[1]  Kentaro Inui,et al.  Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.

[2]  Donggyu Kim,et al.  Domain-agnostic Question-Answering with Adversarial Training , 2019, EMNLP.

[3]  Iryna Gurevych,et al.  Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance , 2020, ACL.

[4]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[5]  Yonatan Belinkov,et al.  Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference , 2019, ACL.

[6]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[7]  Quoc V. Le,et al.  BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.

[8]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[9]  Mohit Bansal,et al.  Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.

[10]  Haohan Wang,et al.  Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.

[11]  Yejin Choi,et al.  Adversarial Filters of Dataset Biases , 2020, ICML.

[12]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[13]  Yan Xu,et al.  Generalizing Question Answering System with Pre-trained Language Model Fine-tuning , 2019, EMNLP.

[14]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[15]  Zhucheng Tu,et al.  An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering , 2019, EMNLP.

[16]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[17]  Ellie Pavlick,et al.  When does data augmentation help generalization in NLP? , 2020, ArXiv.

[18]  Mitesh M. Khapra,et al.  DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension , 2018, ACL.

[19]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[20]  Noah Constant,et al.  MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering Models , 2020, ADAPTNLP.

[21]  Tomoki Taniguchi,et al.  CLER: Cross-task Learning with Expert Representation to Generalize Reading and Understanding , 2019, EMNLP.

[22]  Luke Zettlemoyer,et al.  Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.

[23]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[24]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[25]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[26]  Jonghyun Choi,et al.  Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Regina Barzilay,et al.  Towards Debiasing Fact Verification Models , 2019, EMNLP.

[28]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[29]  Hossein Mobahi,et al.  Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.

[30]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[31]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[32]  Ellie Pavlick,et al.  Does Data Augmentation Improve Generalization in NLP , 2020 .

[33]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[34]  Eunsol Choi,et al.  MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension , 2019, MRQA@EMNLP.

[35]  Xiyuan Zhang,et al.  D-NET: A Pre-Training and Fine-Tuning Framework for Improving the Generalization of Machine Reading Comprehension , 2019, EMNLP.

[36]  Dirk Weissenborn,et al.  Making Neural QA as Simple as Possible but not Simpler , 2017, CoNLL.

[37]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[38]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[39]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[40]  Mariana L. Neves,et al.  Neural Question Answering at BioASQ 5B , 2017, BioNLP.

[41]  Danna Gurari,et al.  Dataset bias: A case study for visual question answering , 2019, ASIST.

[42]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[43]  Yonatan Belinkov,et al.  End-to-End Bias Mitigation by Modelling Biases in Corpora , 2020, ACL.