Answering Questions on COVID-19 in Real-Time

The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it. One reason why the fight is difficult is due to the lack of information and knowledge. In this work, we outline our effort to contribute to shrinking this knowledge vacuum by creating covidAsk, a question answering (QA) system that combines biomedical text mining and QA techniques to provide answers to questions in real-time. Our system leverages both supervised and unsupervised approaches to provide informative answers using DenSPI (Seo et al., 2019) and BEST (Lee et al., 2016). Evaluation of covidAsk is carried out by using a manually created dataset called COVID-19 Questions which is based on facts about COVID-19. We hope our system will be able to aid researchers in their search for knowledge and information not only for COVID-19 but for future pandemics as well.

[1]  Yu Su,et al.  Document Classification for COVID-19 Literature , 2020, NLPCOVID19.

[2]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[3]  Jaewoo Kang,et al.  Contextualized Sparse Representations for Real-Time Open-Domain Question Answering , 2020, ACL.

[4]  Ruslan Salakhutdinov,et al.  Semi-Supervised QA with Generative Domain-Adaptive Nets , 2017, ACL.

[5]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[6]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[7]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[8]  Axel-Cyrille Ngonga Ngomo,et al.  BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering , 2012, AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text.

[9]  Ali Farhadi,et al.  Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , 2019, ACL.

[10]  Jaehoon Choi,et al.  BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature , 2016, PloS one.

[11]  Ludovic Denoyer,et al.  Unsupervised Question Answering by Cloze Translation , 2019, ACL.

[12]  Graham Neubig,et al.  Differentiable Reasoning over a Virtual Knowledge Base , 2020, ICLR.

[13]  Richard Socher,et al.  Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[14]  Yonghwa Choi,et al.  A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining , 2019, IEEE Access.

[15]  Jimmy J. Lin,et al.  Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset , 2020, NLPCOVID19.

[16]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[17]  Ali Farhadi,et al.  Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension , 2018, EMNLP.

[18]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[19]  Kirk Roberts,et al.  TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19 , 2020, J. Am. Medical Informatics Assoc..

[20]  Kirk Roberts,et al.  TREC-COVID , 2020, SIGIR Forum.

[21]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[22]  Janu Verma,et al.  Information Retrieval and Extraction on COVID-19 Clinical Articles Using Graph Community Detection and Bio-BERT Embeddings , 2020, NLPCOVID19.

[23]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[24]  Soroush Vosoughi,et al.  What Are People Asking About COVID-19? A Question Classification Dataset , 2020, NLPCOVID19.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Jaewoo Kang,et al.  Pre-trained Language Model for Biomedical Question Answering , 2019, PKDD/ECML Workshops.

[27]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[28]  Anthony Reina,et al.  COVID-QA: A Question Answering Dataset for COVID-19 , 2020 .

[29]  Ming Zhou,et al.  Question Generation for Question Answering , 2017, EMNLP.

[30]  Jaewoo Kang,et al.  Biomedical Entity Representations with Synonym Marginalization , 2020, ACL.

[31]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[32]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[33]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.