Mining a stroke knowledge graph from literature

Background Stroke has an acute onset and a high mortality rate, making it one of the most fatal diseases worldwide. Its underlying biology and treatments have been widely studied both in the “Western” biomedicine and the Traditional Chinese Medicine (TCM). However, these two approaches are often studied and reported in insolation, both in the literature and associated databases. Results To aid research in finding effective prevention methods and treatments, we integrated knowledge from the literature and a number of databases (e.g. CID, TCMID, ETCM). We employed a suite of biomedical text mining (i.e. named-entity) approaches to identify mentions of genes, diseases, drugs, chemicals, symptoms, Chinese herbs and patent medicines, etc. in a large set of stroke papers from both biomedical and TCM domains. Then, using a combination of a rule-based approach with a pre-trained BioBERT model, we extracted and classified links and relationships among stroke-related entities as expressed in the literature. We construct StrokeKG, a knowledge graph includes almost 46 k nodes of nine types, and 157 k links of 30 types, connecting diseases, genes, symptoms, drugs, pathways, herbs, chemical, ingredients and patent medicine. Conclusions Our Stroke-KG can provide practical and reliable stroke-related knowledge to help with stroke-related research like exploring new directions for stroke research and ideas for drug repurposing and discovery. We make StrokeKG freely available at http://114.115.208.144:7474/browser/ (Please click "Connect" directly) and the source structured data for stroke at https://github.com/yangxi1016/Stroke Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04292-4.

[1]  Xin Yu,et al.  BioBERT Based Named Entity Recognition in Electronic Medical Record , 2019, 2019 10th International Conference on Information Technology in Medicine and Education (ITME).

[2]  Gerhard Weikum,et al.  KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences , 2015, BMC Bioinformatics.

[3]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[4]  Yan Shi,et al.  TCMID 2.0: a comprehensive resource for TCM , 2017, Nucleic Acids Res..

[5]  Michael C. Rosenstein,et al.  The Comparative Toxicogenomics Database (CTD): a resource for comparative toxicological studies. , 2006, Journal of experimental zoology. Part A, Comparative experimental biology.

[6]  T. Killeen,et al.  Effect of prize-based incentives on outcomes in stimulant abusers in outpatient psychosocial treatment programs: a national drug abuse treatment clinical trials network study. , 2005, Archives of general psychiatry.

[7]  Ming Liu,et al.  Stroke in China: advances and challenges in epidemiology, prevention, and management , 2019, The Lancet Neurology.

[8]  Yu-Chen Lee,et al.  Chinese Herbal Medicine and Acupuncture Reduced the Risk of Stroke After Bell's Palsy: A Population-Based Retrospective Cohort Study. , 2019, Journal of alternative and complementary medicine.

[9]  Anushi Shah,et al.  Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes , 2016, Nature.

[10]  Paloma Martínez,et al.  The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[11]  Wei Zhang,et al.  ETCM: an encyclopaedia of traditional Chinese medicine , 2018, Nucleic Acids Res..

[12]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[13]  Thanh Hai Dang,et al.  D3NER: biomedical named entity recognition using CRF‐biLSTM improved with fine‐tuned embeddings of various linguistic information , 2018, Bioinform..

[14]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[15]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[16]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[17]  Jure Leskovec,et al.  Modeling polypharmacy side effects with graph convolutional networks , 2018, bioRxiv.

[18]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[19]  Amit P. Sheth,et al.  Personalized Health Knowledge Graph , 2018, CKGSemStats@ISWC.

[20]  R. Teschke,et al.  Traditional Chinese Medicine and herbal hepatotoxicity: a tabular compilation of reported cases. , 2015 .

[21]  Jiayu Zhou,et al.  Graph convolutional networks for computational drug development and discovery , 2019, Briefings Bioinform..

[22]  Sameh K. Mohamed,et al.  Discovering protein drug targets using knowledge graph embeddings , 2019, Bioinform..

[23]  Yang Jin,et al.  Capsule Network Performance on Complex Data , 2017, ArXiv.

[24]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[25]  Christopher L. Camp,et al.  Restructuring a basic science course for core competencies: An example from anatomy teaching , 2009, Medical teacher.

[26]  G. Pazour,et al.  Ror2 signaling regulates Golgi structure and transport through IFT20 for tumor invasiveness , 2017, Scientific Reports.

[27]  Tudor I. Oprea,et al.  ChemProt: a disease chemical biology database , 2010, Nucleic Acids Res..

[28]  Xiaoxia Liu,et al.  SemaTyP: a knowledge graph based literature mining method for drug discovery , 2018, BMC Bioinformatics.

[29]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[30]  Yasser El-Sonbaty,et al.  Semi-Supervised Pattern Based Algorithm for Arabic Relation Extraction , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[31]  V. Feigin,et al.  Global prevention of stroke and dementia: the WSO Declaration , 2020, The Lancet Neurology.

[32]  V. Mok,et al.  Intravenous alteplase for Chinese patients with stroke and borderline eligibility , 2012, Journal of Clinical Neuroscience.

[33]  B. Burke,et al.  BioID: A Screen for Protein‐Protein Interactions , 2018, Current protocols in protein science.

[34]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[35]  Wei Wang,et al.  Dependency-based long short term memory network for drug-drug interaction extraction , 2017, BMC Bioinformatics.

[36]  Martijn J Schuemie,et al.  EU-ADR healthcare database network vs. spontaneous reporting system database: preliminary comparison of signal detection. , 2011, Studies in health technology and informatics.

[37]  Min Song,et al.  PKDE4J: Entity and relation extraction for public knowledge discovery , 2015, J. Biomed. Informatics.

[38]  Laura Inés Furlong,et al.  The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships , 2012, J. Biomed. Informatics.

[39]  Lars Juhl Jensen,et al.  CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision , 2020, Bioinform..

[40]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[41]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[42]  Dong Wang,et al.  Relation Classification via Recurrent Neural Network , 2015, ArXiv.

[43]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[44]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[45]  V. Feigin,et al.  Global Burden of Stroke. , 2017, Circulation research.

[46]  Won-Ho Shin,et al.  Deep learning of mutation-gene-drug relations from the literature , 2017, BMC Bioinformatics.

[47]  Wanxiang Che,et al.  Convolution Neural Network for Relation Extraction , 2013, ADMA.

[48]  Zhiyong Lu,et al.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[49]  Jiebo Luo,et al.  Constructing biomedical domain-specific knowledge graph with minimum supervision , 2019, Knowledge and Information Systems.

[50]  Ivan Griffin,et al.  COVID-19: combining antiviral and anti-inflammatory treatments , 2020, The Lancet Infectious Diseases.

[51]  Georg Brabant,et al.  Constructing a molecular interaction network for thyroid cancer via large-scale text mining of gene and pathway events , 2015, BMC Systems Biology.

[52]  Philip S. Yu,et al.  A Survey on Knowledge Graphs: Representation, Acquisition, and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Zina M. Ibrahim,et al.  Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records , 2017, Scientific Reports.

[54]  P. Widimsky,et al.  Acute stroke therapy: A review. , 2017, Trends in cardiovascular medicine.

[55]  Wei Zhou,et al.  TCMSP: a database of systems pharmacology for drug discovery from herbal medicines , 2014, Journal of Cheminformatics.

[56]  Zhiyong Lu,et al.  GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains , 2015, BioMed research international.

[57]  Hongfang Liu,et al.  BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences , 2017, Database J. Biol. Databases Curation.

[58]  Yixin Cao,et al.  KGAT: Knowledge Graph Attention Network for Recommendation , 2019, KDD.

[59]  Hyunju Lee,et al.  A corpus of plant–disease relations in the biomedical domain , 2019, PloS one.

[60]  Theodosia Togia,et al.  Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns , 2019, BioNLP@ACL.

[61]  Yijia Zhang,et al.  A hybrid model based on neural networks for biomedical relation extraction , 2018, J. Biomed. Informatics.

[62]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[63]  Kang Ning,et al.  TCM-Mesh: The database and analytical system for network pharmacology analysis for TCM preparations , 2017, Scientific Reports.