Overview of the MEDIQA 2019 Shared Task on Textual Inference, Question Entailment and Question Answering

This paper presents the MEDIQA 2019 shared task organized at the ACL-BioNLP workshop. The shared task is motivated by a need to develop relevant methods, techniques and gold standards for inference and entailment in the medical domain, and their application to improve domain specific information retrieval and question answering systems. MEDIQA 2019 includes three tasks: Natural Language Inference (NLI), Recognizing Question Entailment (RQE), and Question Answering (QA) in the medical domain. 72 teams participated in the challenge, achieving an accuracy of 98% in the NLI task, 74.9% in the RQE task, and 78.3% in the QA task. In this paper, we describe the tasks, the datasets, and the participants’ approaches and results. We hope that this shared task will attract further research efforts in textual inference, question entailment, and question answering in the medical domain.

[1]  Asma Ben Abacha,et al.  A question-entailment approach to question answering , 2019, BMC Bioinformatics.

[2]  Asma Ben Abacha,et al.  Recognizing Question Entailment for Medical Question Answering , 2016, AMIA.

[3]  Huiwei Zhou,et al.  DUT-BIM at MEDIQA 2019: Utilizing Transformer Network and Medical Domain-Specific Contextualized Representations for Question Answering , 2019, BioNLP@ACL.

[4]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[5]  Francisco M. Couto,et al.  LasigeBioTM at MEDIQA 2019: Biomedical Question Answering using Bidirectional Transformers and Named Entity Recognition , 2019, BioNLP@ACL.

[6]  William R. Kearns,et al.  UW-BHI at MEDIQA 2019: An Analysis of Representation Methods for Medical Natural Language Inference , 2019, BioNLP@ACL.

[7]  Teruko Mitamura,et al.  Sieg at MEDIQA 2019: Multi-task Neural Ensemble for Biomedical Inference and Entailment , 2019, BioNLP@ACL.

[8]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[9]  Xiaodong Liu,et al.  DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain , 2019, BioNLP@ACL.

[10]  Zhenchang Xing,et al.  ANU-CSIRO at MEDIQA 2019: Question Answering Using Deep Contextual Knowledge , 2019, BioNLP@ACL.

[11]  P. Gorman,et al.  A taxonomy of generic clinical questions: classification study , 2000, BMJ : British Medical Journal.

[12]  Vaidheeswaran Archana,et al.  Saama Research at MEDIQA 2019: Pre-trained BioBERT with Attention Visualisation for Medical Natural Language Inference , 2019, BioNLP@ACL.

[13]  Asma Ben Abacha,et al.  Semantic Analysis and Automatic Corpus Construction for Entailment Recognition in Medical Texts , 2015, AIME.

[14]  Teruko Mitamura,et al.  Dr.Quad at MEDIQA 2019: Towards Textual Inference and Question Entailment using contextualized representations , 2019, BioNLP@ACL.

[15]  Kyle Lo,et al.  SciBERT: Pretrained Contextualized Embeddings for Scientific Text , 2019, ArXiv.

[16]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[17]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[18]  Sanda M. Harabagiu,et al.  Methods for Using Textual Entailment in Open-Domain Question Answering , 2006, ACL.

[19]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[20]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[21]  Rajesh Ranganath,et al.  ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission , 2019, ArXiv.

[22]  Tianxi Cai,et al.  Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data , 2018, PSB.

[23]  Anand Kumar,et al.  ARS_NITK at MEDIQA 2019: Analysing Various Methods for Natural Language Inference, Recognising Question Entailment and Medical Question Answering System , 2019, BioNLP@ACL.

[24]  Deniz Yuret,et al.  KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI , 2019, BioNLP@ACL.

[25]  Asma Ben Abacha,et al.  A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature , 2016, COLING.

[26]  Lung-Hao Lee,et al.  NCUEE at MEDIQA 2019: Medical Text Inference Using Ensemble BERT-BiLSTM-Attention Model , 2019, BioNLP@ACL.

[27]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[28]  Neil R. Smalheiser,et al.  ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[29]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[30]  Asif Ekbal,et al.  IITP at MEDIQA 2019: Systems Report for Natural Language Inference, Question Entailment and Question Answering , 2019, BioNLP@ACL.

[31]  Marco Spruit,et al.  UU_TAILS at MEDIQA 2019: Learning Textual Entailment in the Medical Domain , 2019, BioNLP@ACL.

[32]  Alexey Romanov,et al.  Lessons from Natural Language Inference in the Clinical Domain , 2018, EMNLP.

[33]  Asma Ben Abacha,et al.  Consumer health information and question answering: helping consumers find answers to their health-related information needs , 2019, J. Am. Medical Informatics Assoc..

[34]  Yifan Peng,et al.  BioSentVec: creating sentence embeddings for biomedical texts , 2018, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[35]  Asli Çelikyilmaz,et al.  A Graph-based Semi-Supervised Learning for Question-Answering , 2009, ACL.

[36]  Ido Dagan,et al.  Benchmarking Applied Semantic Inference: The PASCAL Recognising Textual Entailment Challenges , 2014, Language, Culture, Computation.

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Prakhar Sharma,et al.  IIT-KGP at MEDIQA 2019: Recognizing Question Entailment using Sci-BERT stacked with a Gradient Boosting Classifier , 2019, BioNLP@ACL.

[39]  Yuan Ni,et al.  PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation , 2019, BioNLP@ACL.

[40]  Teruko Mitamura,et al.  Pentagon at MEDIQA 2019: Multi-task Learning for Filtering and Re-ranking Answers using Language Inference and Question Entailment , 2019, BioNLP@ACL.

[41]  Ido Dagan,et al.  Entailment-based Text Exploration with Application to the Health-care Domain , 2012, ACL.

[42]  Kyomin Jung,et al.  Surf at MEDIQA 2019: Improving Performance of Natural Language Inference in the Clinical Domain by Adopting Pre-trained Language Model , 2019, BioNLP@ACL.

[43]  Ankita Gupta,et al.  MSIT_SRIB at MEDIQA 2019: Knowledge Directed Multi-task Framework for Natural Language Inference in Clinical Domain , 2019, BioNLP@ACL.

[44]  Xuefei Li,et al.  DUT-NLP at MEDIQA 2019: An Adversarial Multi-Task Network to Jointly Model Recognizing Question Entailment and Question Answering , 2019, BioNLP@ACL.

[45]  Eric Fosler-Lussier,et al.  Textual inference for eligibility criteria resolution in clinical trials , 2015, J. Biomed. Informatics.

[46]  Eugene Agichtein,et al.  Overview of the Medical Question Answering Task at TREC 2017 LiveQA , 2017, TREC.

[47]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[48]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[49]  Yan Song,et al.  WTMED at MEDIQA 2019: A Hybrid Approach to Biomedical Natural Language Inference , 2019, BioNLP@ACL.