论文信息 - Finding Generalizable Evidence by Learning to Convince Q&A Models

Finding Generalizable Evidence by Learning to Convince Q&A Models

We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage. Rather than finding evidence that convinces one model alone, we find that agents select evidence that generalizes; agent-chosen evidence increases the plausibility of the supported answer, as judged by other QA models and humans. Given its general nature, this approach improves QA in a robust manner: using agent-selected evidence (i) humans can correctly answer questions with only ~20% of the full passage and (ii) QA models can generalize to longer passages and harder questions.

[1] Amanda Askell,et al. AI Safety Needs Social Scientists , 2019, Distill.

[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[3] Fei Liu,et al. Reinforced Extractive Summarization with Question-Focused Rewards , 2018, ACL.

[4] Lydia B. Chilton,et al. Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing , 2018, CHI.

[5] Carlos Guestrin,et al. Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[6] Michael Elhadad,et al. Question Answering as an Automatic Evaluation Metric for News Article Summarization , 2019, NAACL.

[7] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[8] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[9] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[11] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13] Fei Liu,et al. Guiding Extractive Summarization with Question-Answering Rewards , 2019, NAACL.

[14] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[15] Kentaro Inui,et al. What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.

[16] Linda B. Smith,et al. The importance of shape in early lexical learning , 1988 .

[17] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[18] Christian S. Perone,et al. Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[19] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[20] Stefan Feuerriegel,et al. Learning from On-Line User Feedback in Neural Question Answering on the Web , 2019 .

[21] Eunsol Choi,et al. Coarse-to-Fine Question Answering for Long Documents , 2016, ACL.