论文信息 - Frustratingly Hard Evidence Retrieval for QA Over Books - 字舞流文

Frustratingly Hard Evidence Retrieval for QA Over Books

A lot of progress has been made to improve question answering (QA) in recent years, but the special problem of QA over narrative book stories has not been explored in-depth. We formulate BookQA as an open-domain QA task given its similar dependency on evidence retrieval. We further investigate how state-of-the-art open-domain QA approaches can help BookQA. Besides achieving state-of-the-art on the NarrativeQA benchmark, our study also reveals the difficulty of evidence retrieval in books with a wealth of experiments and analysis - which necessitates future effort on novel solutions for evidence retrieval in BookQA.

Xiaoxiao Guo | Xiangyang Mou | Saloni Potdar | Hui Su | Mo Yu | Bingsheng Yao | Chenghao Yang | Xiaoxiao Guo | Mo Yu | Chenghao Yang | Saloni Potdar | Xiangyang Mou | Bingsheng Yao | Hui Su

[1] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2] Shuohang Wang,et al. Learning Natural Language Inference with LSTM , 2015, NAACL.

[3] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[4] Wenhan Xiong,et al. Learning to Recover Reasoning Chains for Multi-Hop Question Answering via Cooperative Games , 2020, ArXiv.

[5] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[6] Wei Zhang,et al. R3: Reinforced Reader-Ranker for Open-Domain Question Answering , 2017, ArXiv.

[7] Hannes Schulz,et al. Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation , 2017, ArXiv.

[8] Wei Zhang,et al. R3: Reinforced Ranker-Reader for Open-Domain Question Answering , 2018, AAAI.

[9] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[10] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[11] Junji Tomita,et al. Multi-style Generative Reading Comprehension , 2019, ACL.

[12] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[13] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[14] Chris Dyer,et al. The NarrativeQA Reading Comprehension Challenge , 2017, TACL.

[15] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[16] Lea Frermann,et al. Extractive NarrativeQA with Heuristic Pre-Training , 2019, EMNLP.

[17] Siu Cheung Hui,et al. Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives , 2019, ACL.

[18] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[19] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[20] Danqi Chen,et al. A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.

[21] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22] Jason Weston,et al. Finding Generalizable Evidence by Learning to Convince Q&A Models , 2019, EMNLP.