HEAD-QA: A Healthcare Dataset for Complex Reasoning

We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.

[1]  Paolo Rosso,et al.  Mining Knowledge fromWikipedia for the Question Answering task , 2006, LREC.

[2]  Yiannis Kompatsiaris,et al.  A Test Collection for Passage Retrieval Evaluation of Spanish Health-Related Resources , 2019, ECIR.

[3]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[4]  Fernando Llopis,et al.  Question Answering in Spanish , 2003, CLEF.

[5]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[6]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[7]  Asma Ben Abacha,et al.  Semantic Analysis and Automatic Corpus Construction for Entailment Recognition in Medical Texts , 2015, AIME.

[8]  Zachary C. Lipton,et al.  How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.

[9]  Ioannis A. Kakadiaris,et al.  Results of the sixth edition of the BioASQ Challenge , 2018 .

[10]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[13]  Ming Zhou,et al.  Reinforced Mnemonic Reader for Machine Reading Comprehension , 2017, IJCAI.

[14]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[15]  Hwee Tou Ng,et al.  A Nil-Aware Answer Extraction Framework for Question Answering , 2018, EMNLP.

[16]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[17]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[18]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[19]  Oren Etzioni,et al.  Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions , 2016, AAAI.

[20]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[21]  Jonghyun Choi,et al.  Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Richard Socher,et al.  DCN+: Mixed Objective and Deep Residual Coattention for Question Answering , 2017, ICLR.

[23]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[24]  Asma Ben Abacha,et al.  Recognizing Question Entailment for Medical Question Answering , 2016, AMIA.