Towards Robust Neural Machine Reading Comprehension via Question Paraphrases

In this paper, we focus on addressing the over-sensitivity issue of neural machine reading comprehension (MRC) models. By oversensitivity, we mean that the neural MRC models give different answers to question paraphrases that are semantically equivalent. To address this issue, we first create a large-scale Chinese MRC dataset with high-quality question paraphrases generated by a toolkit used in Baidu Search. Then, we quantitively analyze the oversensitivity issue of the neural MRC models on the dataset. Intuitively, if two questions are paraphrases of each other, a robust model should give the same predictions. Based on this intuition, we propose a regularized BERT-based model to encourage the model give the same predictions to similar inputs by leveraging high-quality question paraphrases. The experimental results show that our approaches can significantly improve the robustness of a strong BERT-based MRC model and achieve improvements over the BERT-based model in terms of held-out accuracy. Specifically, the different prediction ratio (DPR) for question paraphrases of the proposed model decreases more than 10%.

[1]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[2]  Bo An,et al.  Sentence Rewriting for Semantic Parsing , 2016, ACL.

[3]  Kyunghyun Cho,et al.  SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[4]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[5]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[6]  Ming Zhou,et al.  Question Answering over Freebase with Multi-Column Convolutional Neural Networks , 2015, ACL.

[7]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[8]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[9]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[10]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[11]  Oladimeji Farri,et al.  Neural Paraphrase Generation with Stacked Residual LSTM Networks , 2016, COLING.

[12]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[13]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[14]  Mirella Lapata,et al.  Paraphrasing Revisited with Neural Machine Translation , 2017, EACL.

[15]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[16]  Shashi Narayan,et al.  Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing , 2016, INLG.

[17]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[18]  Xinyan Xiao,et al.  DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications , 2017, QA@ACL.

[19]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[20]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[21]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[22]  Mirella Lapata,et al.  Learning to Paraphrase for Question Answering , 2017, EMNLP.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Jennifer Chu-Carroll,et al.  Answering the question you wish they had asked: The impact of paraphrasing for Question Answering , 2006, NAACL.

[26]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[27]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[28]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[29]  Hang Li,et al.  Paraphrase Generation with Deep Reinforcement Learning , 2017, EMNLP.

[30]  Oren Etzioni,et al.  Paraphrase-Driven Learning for Open Question Answering , 2013, ACL.