COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation

To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence. All of the data, KGs, reports.

[1]  Qingyu Chen,et al.  An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining , 2020, BIONLP.

[2]  Heng Ji,et al.  Biomedical Event Extraction based on Knowledge-driven Tree-LSTM , 2019, NAACL.

[3]  Taylor Cassidy,et al.  The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding , 2014, COLING.

[4]  Heng Ji,et al.  Cross-media Structured Common Space for Multimedia Event Extraction , 2020, ACL.

[5]  Hoifung Poon,et al.  Distant Supervision for Relation Extraction beyond the Sentence Boundary , 2016, EACL.

[6]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2017 , 2016, Nucleic Acids Res..

[7]  Donald C. Comeau,et al.  LitSense: making sense of biomedical literature at sentence level , 2019, Nucleic Acids Res..

[8]  Chi Zhang,et al.  Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B , 2015, CLEF.

[9]  Eric Horvitz,et al.  SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search , 2020, EMNLP.

[10]  Heng Ji,et al.  Modeling Truth Existence in Truth Discovery , 2015, KDD.

[11]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[12]  Waleed Ammar,et al.  Extracting Scientific Figures with Distantly Supervised Neural Networks , 2018, JCDL.

[13]  Dragomir R. Radev,et al.  CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization , 2020, ArXiv.

[14]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[15]  Peter Szolovits,et al.  Bridging semantics and syntax with graph algorithms - state-of-the-art of extracting biomedical relations , 2017, Briefings Bioinform..

[16]  Arthur S Slutsky,et al.  Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target , 2020, Intensive Care Medicine.

[17]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[18]  Yifan Peng,et al.  Improving chemical disease relation extraction with rich features and weakly labeled data , 2016, Journal of Cheminformatics.

[19]  David J. Crandall,et al.  A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Pedro A. Szekely,et al.  KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis , 2020, SEMWEB.

[21]  Yu Zhang,et al.  Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning , 2018, bioRxiv.

[22]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[23]  Oren Etzioni,et al.  CORD-19: The Covid-19 Open Research Dataset , 2020, NLPCOVID19.

[24]  Eric Nyberg,et al.  Tackling Biomedical Text Summarization: OAQA at BioASQ 5B , 2017, BioNLP.

[25]  Zhiyong Lu,et al.  The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text , 2011, BMC Bioinformatics.

[26]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[27]  Ying Lin,et al.  GAIA: A Fine-grained Multimedia Knowledge Extraction System , 2020, ACL.

[28]  Teng Ren,et al.  Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[29]  David Martínez,et al.  Global Locality in Biomedical Relation and Event Extraction , 2020, BioNLP.

[30]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[31]  Daniel Satchkov,et al.  Artificial Intelligence-Powered Search Tools and Resources in the Fight Against COVID-19 , 2020, EJIFCC.

[32]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[33]  Louise Deléger,et al.  Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016 , 2016, BioNLP.

[34]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[35]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[36]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  S. Ekins,et al.  FDA approved drugs as potential Ebola treatments , 2015, F1000Research.

[38]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[39]  A. Valencia,et al.  Overview of the chemical compound and drug name recognition ( CHEMDNER ) task , 2013 .

[40]  Francis Wolinski,et al.  Visualization of Diseases at Risk in the COVID-19 Literature , 2020, ArXiv.

[41]  Zhiyong Lu,et al.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[42]  Xuan Wang,et al.  EVIDENCEMINER: Textual Evidence Discovery for Life Sciences , 2020, ACL.

[43]  Yifan Peng,et al.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task , 2016, Database J. Biol. Databases Curation.

[44]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[45]  Xuan Wang,et al.  Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision , 2020, ArXiv.

[46]  Yu Zhang,et al.  Open Information Extraction with Meta-pattern Discovery in Biomedical Literature , 2018, BCB.

[47]  Robert Leaman,et al.  PubTator central: automated concept annotation for biomedical full text articles , 2019, Nucleic Acids Res..

[48]  Maryam Habibi,et al.  Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[49]  Peter M. A. Sloot,et al.  A novel feature-based approach to extract drug-drug interactions from biomedical text , 2014, Bioinform..

[50]  Heng Ji,et al.  Syntax-aware Multi-task Graph Convolutional Networks for Biomedical Relation Extraction , 2019, EMNLP.

[51]  Cathy H. Wu,et al.  Pattern Discovery for Wide-Window Open Information Extraction in Biomedical Literature , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[52]  Sabber Ahamed,et al.  Information Mining for COVID-19 Research From a Large Volume of Scientific Literature , 2020, ArXiv.

[53]  Eric Nyberg,et al.  Learning to Answer Biomedical Questions: OAQA at BioASQ 4B , 2016 .

[54]  Heng Ji,et al.  Expertise-Aware Truth Analysis and Task Allocation in Mobile Crowdsourcing , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[55]  Heng Ji,et al.  PaperRobot: Incremental Draft Generation of Scientific Ideas , 2019, ACL.

[56]  Heng Ji,et al.  Entity linking for biomedical literature , 2014, DTMBIO '14.

[57]  Paloma Martínez,et al.  SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013) , 2013, *SEMEVAL.

[58]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[59]  Xiaohui Liang,et al.  CHEMDNER system with mixed conditional random fields and multi-scale word clustering , 2015, Journal of Cheminformatics.

[60]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[61]  Mariana L. Neves,et al.  Olelo: a web application for intuitive exploration of biomedical literature , 2017, Nucleic Acids Res..

[62]  R. S. Huang,et al.  Overview of Bacteria , 2017 .

[63]  Heng Ji,et al.  Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems , 2017, Big Data.

[64]  T. Dokland,et al.  Structure of the host cell recognition and penetration machinery of a Staphylococcus aureus bacteriophage , 2019, bioRxiv.

[65]  Jari Björne,et al.  Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization , 2013, PloS one.

[66]  Weili Liu,et al.  Automatic Textual Evidence Mining in COVID-19 Literature , 2020, ArXiv.

[67]  Dan Lahav,et al.  Interactive Extractive Search over Biomedical Corpora , 2020, BIONLP.

[68]  Qi Li,et al.  Distantly Supervised Biomedical Named Entity Recognition with Dictionary Expansion , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[69]  Sampo Pyysalo,et al.  A neural network multi-task learning approach to biomedical named entity recognition , 2017, BMC Bioinformatics.

[70]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[71]  James Pustejovsky,et al.  Exploration and Discovery of the COVID-19 Literature through Semantic Visualization , 2020, NAACL.