A Tissue-Specific and Toxicology-Focused Knowledge Graph

Molecular biology-focused knowledge graphs (KGs) are directed graphs that integrate information from heterogeneous sources of biological and biomedical data, such as ontologies and public databases. They provide a holistic view of biology, chemistry, and disease, allowing users to draw non-obvious connections between concepts through shared associations. While these massive graphs are constructed using carefully curated ontologies and annotations from public databases, much of the information relating the concepts is context specific. Two important variables that determine the applicability of a given ontology annotation are the species and (especially) the tissue type in which it takes place. Using a data-driven approach and the results from thousands of high-quality gene expression samples, we have constructed tissue-specific KGs (using liver, kidney, and heart as examples) that empirically validate the annotations provided by ontology curators. The resulting human-centered KGs are designed for toxicology applications but are generalizable to other areas of human biology, addressing the issue of tissue specificity that often limits the applicability of other large KGs. These knowledge graphs can serve as valuable tools for generating transparent explanations of experimental results in the form of mechanistic hypotheses that are highly relevant to the studied tissue. Because the data-driven relations are derived from a large collection of human in vitro data, these KGs are particularly well suited for in vitro toxicology applications.

[1]  Michelle M. Li,et al.  Graph representation learning in biomedicine and healthcare , 2022, Nature Biomedical Engineering.

[2]  Gary D Bader,et al.  The reactome pathway knowledgebase 2022 , 2021, Nucleic Acids Res..

[3]  Philip S. Yu,et al.  A Survey on Knowledge Graphs: Representation, Acquisition, and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Anushya Muruganujan,et al.  The Gene Ontology resource: enriching a GOld mine , 2020, Nucleic Acids Res..

[5]  Christopher G Chute,et al.  The Human Phenotype Ontology in 2021 , 2020, Nucleic Acids Res..

[6]  Nadezhda T. Doncheva,et al.  The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets , 2020, Nucleic Acids Res..

[7]  Melissa J. Landrum,et al.  ClinVar: improvements to accessing data , 2019, Nucleic Acids Res..

[8]  Tudor Groza,et al.  The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2019, Nucleic Acids Res..

[9]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[10]  Dexter Hadley,et al.  Systematic integration of biomedical knowledge prioritizes drugs for repurposing , 2017, bioRxiv.

[11]  Jian Zhang,et al.  Protein Ontology (PRO): enhancing and scaling up the representation of protein entities , 2016, Nucleic Acids Res..

[12]  Alan Ruttenberg,et al.  The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability , 2016, J. Biomed. Semant..

[13]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[14]  W. Baumgartner,et al.  KaBOB: ontology-based semantic integration of biomedical databases , 2015, BMC Bioinformatics.

[15]  David S. Wishart,et al.  T3DB: the toxic exposome database , 2014, Nucleic Acids Res..

[16]  Kimberly Van Auken,et al.  A method for increasing expressivity of Gene Ontology annotations using a compositional approach , 2014, BMC Bioinformatics.

[17]  R. Schwabe,et al.  Transcriptome analysis identifies TNF superfamily receptors as potential therapeutic targets in alcoholic hepatitis , 2012, Gut.

[18]  C. Cohen,et al.  Cardiovascular , Pulmonary and Renal Pathology Human Nephrosclerosis Triggers a Hypoxia-Related Glomerulopathy , 2010 .

[19]  Ben S. Wittner,et al.  Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 , 2009, Nature.

[20]  T. Cooper,et al.  Ethanol alters lipid profiles and phosphorylation status of AMP‐activated protein kinase in the neonatal mouse brain , 2007, Journal of neurochemistry.

[21]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.