Using predicate and provenance information from a knowledge graph for drug efficacy screening

BackgroundBiomedical knowledge graphs have become important tools to computationally analyse the comprehensive body of biomedical knowledge. They represent knowledge as subject-predicate-object triples, in which the predicate indicates the relationship between subject and object. A triple can also contain provenance information, which consists of references to the sources of the triple (e.g. scientific publications or database entries). Knowledge graphs have been used to classify drug-disease pairs for drug efficacy screening, but existing computational methods have often ignored predicate and provenance information. Using this information, we aimed to develop a supervised machine learning classifier and determine the added value of predicate and provenance information for drug efficacy screening. To ensure the biological plausibility of our method we performed our research on the protein level, where drugs are represented by their drug target proteins, and diseases by their disease proteins.ResultsUsing random forests with repeated 10-fold cross-validation, our method achieved an area under the ROC curve (AUC) of 78.1% and 74.3% for two reference sets. We benchmarked against a state-of-the-art knowledge-graph technique that does not use predicate and provenance information, obtaining AUCs of 65.6% and 64.6%, respectively. Classifiers that only used predicate information performed superior to classifiers that only used provenance information, but using both performed best.ConclusionWe conclude that both predicate and provenance information provide added value for drug efficacy screening.

[1]  Carsten O. Daub,et al.  Transcriptional Dynamics Reveal Critical Roles for Non-coding RNAs in the Immediate-Early Response , 2015, PLoS Comput. Biol..

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  S. Esposito,et al.  No evidence for the effectiveness of systemic corticosteroids in acute pharyngitis, community-acquired pneumonia and acute otitis media , 2012, European Journal of Clinical Microbiology & Infectious Diseases.

[4]  Daniel Z Lieberman,et al.  The utility of the combination of dextromethorphan and quinidine in the treatment of bipolar II and bipolar NOS. , 2014, Journal of affective disorders.

[5]  Khader Shameer,et al.  In silico methods for drug repurposing and pharmacology , 2016, Wiley interdisciplinary reviews. Systems biology and medicine.

[6]  Yongjin Li,et al.  Discovering disease-genes by topological features in human protein-protein interaction network , 2006, Bioinform..

[7]  Alexander A. Morgan,et al.  Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data , 2011, Science Translational Medicine.

[8]  A. Barabasi,et al.  Uncovering disease-disease relationships through the incomplete interactome , 2015, Science.

[9]  Tudor I. Oprea,et al.  Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients , 2014, Nature Communications.

[10]  Michael Schroeder,et al.  Discovering relations between indirectly connected biomedical concepts , 2014, DILS.

[11]  Richard Hall,et al.  The Use of Tegaserod in Critically ill Patients with Impaired Gastric Motility , 2005, Clinical pharmacology and therapeutics.

[12]  Barbara Zdrazil,et al.  Scientific competency questions as the basis for semantically enriched open pharmacological space development. , 2013, Drug discovery today.

[13]  Ulf Leser,et al.  Reflection of successful anticancer drug development processes in the literature. , 2016, Drug discovery today.

[14]  Dexter Hadley,et al.  Systematic integration of biomedical knowledge prioritizes drugs for repurposing , 2017, bioRxiv.

[15]  D. Kliebenstein,et al.  An evolutionarily young defense metabolite influences the root growth of plants via the ancient TOR signaling pathway , 2017, bioRxiv.

[16]  Jens Keilwagen,et al.  PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R , 2015, Bioinform..

[17]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[18]  B. Astor,et al.  Hydrochlorothiazide compared to chlorthalidone in reduction of urinary calcium in patients with kidney stones , 2013, Urolithiasis.

[19]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[20]  Erik M. van Mulligen,et al.  Automated extraction of potential migraine biomarkers using a semantic graph , 2017, J. Biomed. Informatics.

[21]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[22]  Emre Guney,et al.  Reproducible Drug Repurposing: When Similarity Does Not Suffice , 2017, PSB.

[23]  Tudor I. Oprea,et al.  A comprehensive map of molecular drug targets , 2016, Nature Reviews Drug Discovery.

[24]  Michel Bouvier,et al.  Pharmacologic chaperones as a potential treatment for X-linked nephrogenic diabetes insipidus. , 2005, Journal of the American Society of Nephrology : JASN.

[25]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[26]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[27]  Diego Bellavia,et al.  Human hypertension is characterized by a lack of activation of the antihypertensive cardiac hormones ANP and BNP. , 2012, Journal of the American College of Cardiology.

[28]  Maricel G. Kann,et al.  Protein interactions and disease: computational approaches to uncover the etiology of diseases , 2007, Briefings Bioinform..

[29]  Huajun Chen,et al.  Semantic web for integrated network analysis in biomedicine , 2009, Briefings Bioinform..

[30]  Martin Hofmann-Apitius,et al.  Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders , 2015, International journal of molecular sciences.

[31]  Paul Workman,et al.  Distinctive Behaviors of Druggable Proteins in Cellular Networks , 2015, PLoS Comput. Biol..

[32]  Xin Chen,et al.  DCDB: Drug combination database , 2010, Bioinform..

[33]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[34]  Thomas Abeel,et al.  Intelligent Systems for Molecular Biology and European Conference on Computational Biology , 2022 .

[35]  Cathy Alessi,et al.  Practice parameters for the treatment of narcolepsy and other hypersomnias of central origin. , 2007, Sleep.

[36]  D. P. Mitchell,et al.  Otitis media in children. , 1987, Canadian family physician Medecin de famille canadien.

[37]  Xiang Zhang,et al.  Drug repositioning by integrating target information through a heterogeneous network model , 2014, Bioinform..

[38]  Akira R. Kinjo,et al.  Neuro-symbolic representation learning on biological knowledge graphs , 2016, Bioinform..

[39]  Sunghoon Kim,et al.  Rational drug repositioning guided by an integrated pharmacological network of protein, disease and drug , 2012, BMC Systems Biology.

[40]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[41]  M. Schroeder,et al.  Drug repositioning through incomplete bi-cliques in an integrated drug-target-disease network. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[42]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[43]  David J. States,et al.  Bioinformatics Applications Note Databases and Ontologies Metab2mesh: Annotating Compounds with Medical Subject Headings , 2022 .

[44]  J M Fry,et al.  Treatment modalities for narcolepsy , 1998, Neurology.

[45]  A. Barabasi,et al.  Network-based in silico drug efficacy screening , 2016, Nature Communications.

[46]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[47]  Adrien Coulet,et al.  Learning from biomedical linked data to suggest valid pharmacogenes , 2017, Journal of Biomedical Semantics.

[48]  Halil Kilicoglu,et al.  Constructing a semantic predication gold standard from the biomedical literature , 2011, BMC Bioinformatics.

[49]  B. Nilsson,et al.  Use of Serum or Buffer-Changed EDTA-Plasma in a Rapid, Inexpensive, and Easy-To-Perform Hemolytic Complement Assay for Differential Diagnosis of Systemic Lupus Erythematosus and Monitoring of Patients with the Disease , 2007, Clinical and Vaccine Immunology.

[50]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[51]  Michael Kuskowski,et al.  Ondansetron in the treatment of cognitive decline in Alzheimer dementia. , 2002, The American journal of geriatric psychiatry : official journal of the American Association for Geriatric Psychiatry.

[52]  F. Jordán,et al.  Studying protein-protein interaction networks: a systems view on diseases. , 2012, Briefings in functional genomics.

[53]  Jennifer L Auger,et al.  Autoantibody-mediated arthritis in the absence of C3 and activating Fcγ receptors: C5 is activated by the coagulation cascade , 2012, Arthritis Research & Therapy.

[54]  Hua Xu,et al.  Development and evaluation of an ensemble resource linking medications to their indications , 2013, J. Am. Medical Informatics Assoc..