Biological applications of knowledge graph embedding models

Complex biological systems are traditionally modelled as graphs of interconnected biological entities. These graphs, i.e. biological knowledge graphs, are then processed using graph exploratory approaches to perform different types of analytical and predictive tasks. Despite the high predictive accuracy of these approaches, they have limited scalability due to their dependency on time-consuming path exploratory procedures. In recent years, owing to the rapid advances of computational technologies, new approaches for modelling graphs and mining them with high accuracy and scalability have emerged. These approaches, i.e. knowledge graph embedding (KGE) models, operate by learning low-rank vector representations of graph nodes and edges that preserve the graph's inherent structure. These approaches were used to analyse knowledge graphs from different domains where they showed superior performance and accuracy compared to previous graph exploratory approaches. In this work, we study this class of models in the context of biological knowledge graphs and their different applications. We then show how KGE models can be a natural fit for representing complex biological knowledge modelled as graphs. We also discuss their predictive and analytical capabilities in different biology applications. In this regard, we present two example case studies that demonstrate the capabilities of KGE models: prediction of drug-target interactions and polypharmacy side effects. Finally, we analyse different practical considerations for KGEs, and we discuss possible opportunities and challenges related to adopting them for modelling biological systems.

[1]  X. Chen,et al.  TTD: Therapeutic Target Database , 2002, Nucleic Acids Res..

[2]  Yang Wang,et al.  Essential Protein Detection by Random Walk on Weighted Protein-Protein Interaction Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Volker Tresp,et al.  Type-Constrained Representation Learning in Knowledge Graphs , 2015, SEMWEB.

[4]  Yadi Zhou,et al.  Prediction of Chemical-Protein Interactions Network with Weighted Network-Based Inference Method , 2012, PloS one.

[5]  Michel Dumontier,et al.  Bio2RDF Release 3: A larger, more connected network of Linked Data for the Life Sciences , 2014, SEMWEB.

[6]  Gareth J Waldron,et al.  Reducing safety-related drug attrition: the use of in vitro pharmacological profiling , 2012, Nature Reviews Drug Discovery.

[7]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[8]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[9]  H. Thurston,et al.  The Antidiuretic Action of Diazoxide , 1972 .

[10]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[11]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  Russ B. Altman,et al.  PharmGKB: the Pharmacogenetics Knowledge Base , 2002, Nucleic Acids Res..

[13]  Vít Novácek,et al.  Loss Functions in Knowledge Graph Embedding Models , 2019, DL4KG@ESWC.

[14]  J. Cohen,et al.  Context, cortex, and dopamine: a connectionist approach to behavior and biology in schizophrenia. , 1992, Psychological review.

[15]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[16]  Andrew Emili,et al.  Proteomic methods for drug target discovery. , 2008, Current opinion in chemical biology.

[17]  Ulf Leser,et al.  NLProlog: Reasoning with Weak Unification for Question Answering in Natural Language , 2019, ACL.

[18]  Matthias Nickles,et al.  Embedding cardinality constraints in neural link predictors , 2018, SAC.

[19]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[20]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[21]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[22]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[23]  Yiming Yang,et al.  Analogical Inference for Multi-relational Embeddings , 2017, ICML.

[24]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[25]  Ozlem Keskin,et al.  A survey of available tools and web servers for analysis of protein-protein interactions and interfaces , 2008, Briefings Bioinform..

[26]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[27]  D. Houlihan,et al.  Maintenance ration, protein synthesis capacity, plasma insulin and growth of Atlantic salmon (Salmo salar L.) with genetically different trypsin isozymes , 1999, Fish Physiology and Biochemistry.

[28]  E. Kharasch,et al.  Halothane-dependent Lipid Peroxidation in Human Liver Microsomes Is Catalyzed by Cytochrome P4502A6 (CYP2A6) , 2001, Anesthesiology.

[29]  Nicolas Usunier,et al.  Canonical Tensor Decomposition for Knowledge Base Completion , 2018, ICML.

[30]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[31]  Vít Novácek,et al.  Facilitating prediction of adverse drug reactions by using knowledge graphs and multi‐label learning models , 2019, Briefings Bioinform..

[32]  Rudolf Kadlec,et al.  Knowledge Base Completion: Baselines Strike Back , 2017, Rep4NLP@ACL.

[33]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[34]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[35]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[36]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[37]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[38]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[39]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[40]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[41]  L. Getoor,et al.  Sparsity and Noise: Where Knowledge Graph Embeddings Fall Short , 2017, EMNLP.

[42]  Vít Novácek,et al.  Using Drug Similarities for Discovery of Possible Adverse Reactions , 2016, AMIA.

[43]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[44]  Andrzej Pelc,et al.  Collective tree exploration , 2004, Networks.

[45]  Yanli Wang,et al.  Predicting drug-target interactions by dual-network integrated logistic matrix factorization , 2017, Scientific Reports.

[46]  Mathias Niepert,et al.  KBlrn: End-to-End Learning of Knowledge Base Representations with Latent, Relational, and Numerical Features , 2017, UAI.

[47]  Vít Novácek,et al.  Drug target discovery using knowledge graph embeddings , 2019, SAC.

[48]  Alexander Peysakhovich,et al.  PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[49]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[50]  Sean R. Eddy,et al.  The Pfam protein families database , 2007, Nucleic Acids Res..

[51]  J. Verster,et al.  Clinical pharmacology, clinical efficacy, and behavioral toxicity of alprazolam: a review of the literature. , 2006, CNS drug reviews.

[52]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[53]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[54]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[55]  Andrew T Chan,et al.  Trends in Prescription Drug Use Among Adults in the United States From 1999-2012. , 2015, JAMA.

[56]  Brandon M. Malone,et al.  Knowledge Graph Completion to Predict Polypharmacy Side Effects , 2018, DILS.

[57]  Tom M. Mitchell,et al.  Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction , 2015, EMNLP.

[58]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[59]  Danqi Chen,et al.  Observed versus latent features for knowledge base and text inference , 2015, CVSC.

[60]  Xiangxiang Zeng,et al.  Probability-based collaborative filtering model for predicting gene–disease associations , 2017, BMC Medical Genomics.

[61]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[62]  Roger J.-B. Wets,et al.  Minimization by Random Search Techniques , 1981, Math. Oper. Res..

[63]  Fábio Gagliardi Cozman,et al.  Interpreting Embedding Models of Knowledge Bases: A Pedagogical Approach , 2018, ICML 2018.

[64]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[65]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[66]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics. , 2014, Molecular & cellular proteomics : MCP.

[67]  V. D’Agati The spectrum of focal segmental glomerulosclerosis: new insights , 2008, Current opinion in nephrology and hypertension.

[68]  A. Bauer-Mehren,et al.  Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases , 2011, PloS one.

[69]  To-Yat Cheung Graph Traversal Techniques and the Maximum Flow Problem in Distributed Computation , 1983, IEEE Trans. Software Eng..

[70]  Thawfeek M. Varusai,et al.  The Reactome Pathway Knowledgebase , 2017, Nucleic acids research.

[71]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[72]  R. Altman,et al.  Data-Driven Prediction of Drug Effects and Interactions , 2012, Science Translational Medicine.

[73]  Christian Bizer,et al.  D2R Server - Publishing Relational Databases on the Semantic Web , 2004 .

[74]  G. Dusting,et al.  Mitochondrial fission – a drug target for cytoprotection or cytodestruction? , 2016, Pharmacology research & perspectives.

[75]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[76]  W. Lipschitz,et al.  BIOASSAY OF DIURETICS , 1943 .

[77]  Fei Wang,et al.  Drug knowledge bases and their applications in biomedical informatics research , 2019, Briefings Bioinform..

[78]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[79]  Sameh K. Mohamed Predicting tissue-specific protein functions using multi-part tensor decomposition , 2020, Inf. Sci..

[80]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[81]  Philip E. Bourne,et al.  SuperTarget goes quantitative: update on drug–target interactions , 2011, Nucleic Acids Res..

[82]  Vít Novácek,et al.  Knowledge base completion using distinct subgraph paths , 2018, SAC.

[83]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[84]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[85]  Ivan G. Costa,et al.  A multiple kernel learning algorithm for drug-target interaction prediction , 2016, BMC Bioinformatics.

[86]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[87]  Vladimir B. Bajic,et al.  DDR: efficient computational method to predict drug–target interactions using graph mining and machine learning approaches , 2017, Bioinform..

[88]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[89]  Hui Liu,et al.  Improving compound–protein interaction prediction by building up highly credible negative samples , 2015, Bioinform..

[90]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[91]  Vít Novácek,et al.  Identifying Equivalent Relation Paths in Knowledge Graphs , 2017, LDK.

[92]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[93]  Vít Novácek,et al.  Regularizing Knowledge Graph Embeddings via Equivalence and Inversion Axioms , 2017, ECML/PKDD.

[94]  S. Amrouch,et al.  Survey on the literature of ontology mapping, alignment and merging , 2012, 2012 International Conference on Information Technology and e-Services.

[95]  Peer Bork,et al.  The SIDER database of drugs and side effects , 2015, Nucleic Acids Res..

[96]  Li Guo,et al.  Jointly Embedding Knowledge Graphs and Logical Rules , 2016, EMNLP.

[97]  Lei Xie,et al.  Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem , 2016, Scientific Reports.

[98]  Karthik Raman,et al.  Construction and analysis of protein–protein interaction networks , 2010, Automated experimentation.

[99]  Akira R. Kinjo,et al.  Neuro-symbolic representation learning on biological knowledge graphs , 2016, Bioinform..

[100]  Ping Zhang,et al.  Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions , 2017, J. Web Semant..

[101]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[102]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[103]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[104]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[105]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[106]  Jure Leskovec,et al.  Modeling polypharmacy side effects with graph convolutional networks , 2018, bioRxiv.

[107]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[108]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[109]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[110]  C. Mattingly,et al.  The Comparative Toxicogenomics Database (CTD). , 2003, Environmental health perspectives.

[111]  Marinka Zitnik,et al.  Collective Pairwise Classification for Multi-Way Analysis of Disease and Drug Data , 2016, PSB.

[112]  Dmitri A. Petrov,et al.  Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes , 2010, Genome biology and evolution.

[113]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[114]  Natasa Przulj,et al.  Biological function through network topology: a survey of the human diseasome , 2012, Briefings in functional genomics.

[115]  Tim Menzies,et al.  Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[116]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[117]  Fei Wang,et al.  Network embedding in biomedical data science , 2018, Briefings Bioinform..

[118]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[119]  Sameh K. Mohamed,et al.  Discovering protein drug targets using knowledge graph embeddings , 2019, Bioinform..

[120]  Sameh K. Mohamed,et al.  Link prediction using multi part embeddings , 2019 .

[121]  G. Terstappen,et al.  Target deconvolution strategies in drug discovery , 2007, Nature Reviews Drug Discovery.