KGHC: a knowledge graph for hepatocellular carcinoma

Background Hepatocellular carcinoma is one of the most general malignant neoplasms in adults with high mortality. Mining relative medical knowledge from rapidly growing text data and integrating it with other existing biomedical resources will provide support to the research on the hepatocellular carcinoma. To this purpose, we constructed a knowledge graph for Hepatocellular Carcinoma (KGHC). Methods We propose an approach to build a knowledge graph for hepatocellular carcinoma. Specifically, we first extracted knowledge from structured data and unstructured data. Since the extracted entities may contain some noise, we applied a biomedical information extraction system, named BioIE, to filter the data in KGHC. Then we introduced a fusion method which is used to fuse the extracted data. Finally, we stored the data into the Neo4j which can help researchers analyze the network of hepatocellular carcinoma. Results KGHC contains 13,296 triples and provides the knowledge of hepatocellular carcinoma for healthcare professionals, making them free of digging into a large amount of biomedical literatures. This could hopefully improve the efficiency of researches on the hepatocellular carcinoma. KGHC is accessible free for academic research purpose at http://202.118.75.18:18895/browser/ . Conclusions In this paper, we present a knowledge graph associated with hepatocellular carcinoma, which is constructed with vast amounts of structured and unstructured data. The evaluation results show that the data in KGHC is of high quality.

[1]  Halil Kilicoglu,et al.  Semantic MEDLINE: A web application for managing the results of PubMed searches , 2008, SMBM 2008.

[2]  Zina M. Ibrahim,et al.  Improving RNN with Attention and Embedding for Adverse Drug Reactions , 2017, DH.

[3]  David Sontag,et al.  Learning a Health Knowledge Graph from Electronic Medical Records , 2017, Scientific Reports.

[4]  Gang Pan,et al.  Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services , 2017, BioMed research international.

[5]  Peer Bork,et al.  The SIDER database of drugs and side effects , 2015, Nucleic Acids Res..

[6]  Gerhard Weikum,et al.  KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences , 2015, BMC Bioinformatics.

[7]  Gerhard Weikum,et al.  Disambiguation of entities in MEDLINE abstracts by combining MeSH terms with knowledge , 2016, BioNLP@ACL.

[8]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[9]  Halil Kilicoglu,et al.  Semantic MEDLINE: An advanced information management application for biomedicine , 2011, Inf. Serv. Use.

[10]  Jim Webber,et al.  A programmatic introduction to Neo4j , 2018, SPLASH '12.

[11]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[12]  Asad U. Khan,et al.  AMDD: Antimicrobial Drug Database , 2012, Genom. Proteom. Bioinform..

[13]  Julio Santisteban,et al.  Unilateral Jaccard Similarity Coefficient , 2015, GSB@SIGIR.

[14]  Aviaja Anna Hansen,et al.  MiDAS: the field guide to the microbes of activated sludge , 2015, Database J. Biol. Databases Curation.

[15]  Guilin Qi,et al.  Exploring Parallel Tractability of Ontology Materialization , 2016, ECAI.

[16]  Gary D. Bader,et al.  Transfer learning for biomedical named entity recognition with neural networks , 2018, bioRxiv.

[17]  Jiebo Luo,et al.  Constructing biomedical domain-specific knowledge graph with minimum supervision , 2019, Knowledge and Information Systems.

[18]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[19]  T. Möröy,et al.  DNA Microarrays in Medicine: Can the Promises Be Kept? , 2002, Journal of biomedicine & biotechnology.

[20]  David Gomez-Cabrero,et al.  ParkDB: a Parkinson’s disease gene expression database , 2011, Database J. Biol. Databases Curation.

[21]  Ting Wang,et al.  An automatic approach for constructing a knowledge base of symptoms in Chinese , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[22]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[23]  Rui Liu,et al.  A hybrid approach for named entity recognition in Chinese electronic medical record , 2019, BMC Medical Informatics and Decision Making.

[24]  Gary D. Bader,et al.  Transfer learning for biomedical named entity recognition with neural networks , 2018 .

[25]  Tudor I. Oprea,et al.  ChemProt-3.0: a global chemical biology diseases mapping , 2016, Database J. Biol. Databases Curation.

[26]  Tudor I. Oprea,et al.  ChemProt-3.0: a global chemical biology diseases mapping , 2016, Database J. Biol. Databases Curation.

[27]  D. Woodfield Hepatocellular carcinoma. , 1986, The New Zealand medical journal.

[28]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[29]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[30]  Xiaohui Liang,et al.  CHEMDNER system with mixed conditional random fields and multi-scale word clustering , 2015, Journal of Cheminformatics.

[31]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[32]  C. Frenette,et al.  Current management of hepatocellular carcinoma. , 2014, Gastroenterology & hepatology.

[33]  Piero Fariselli,et al.  Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure , 2011, BioData Mining.

[34]  Michel Dumontier,et al.  An Ebola virus-centered knowledge base , 2015, Database J. Biol. Databases Curation.

[35]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[36]  David Victor,et al.  Hepatocellular carcinoma: a review , 2016, Journal of hepatocellular carcinoma.

[37]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[38]  Jason H. Moore,et al.  Mining the diseasome , 2011, BioData Mining.