Machine learning with biomedical ontologies

Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge, and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in biomedical ontologies, and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies. Key points Ontologies provide background knowledge that can be exploited in machine learning models. Ontology embeddings are structure-preserving maps from ontologies into vector spaces and provide an important method for utilizing ontologies in machine learning. Embeddings can preserve different structures in ontologies, including their graph structures, syntactic regularities, or their model-theoretic semantics. Axioms in ontologies, in particular those involving negation, can be used as constraints in optimization and machine learning to reduce the search space.

[1]  Stefan Decker,et al.  Creating Semantic Web Contents with Protégé-2000 , 2001, IEEE Intell. Syst..

[2]  Francisco M. Couto,et al.  Semantic Similarity Definition , 2019, Encyclopedia of Bioinformatics and Computational Biology.

[3]  Jens Lehmann,et al.  BioKEEN: a library for learning and evaluating biological knowledge graph embeddings , 2019, Bioinform..

[4]  Jun Chen,et al.  Predicting candidate genes from phenotypes, functions, and anatomical site of expression , 2020, bioRxiv.

[5]  Boris Motik,et al.  OWL 2: The next step for OWL , 2008, J. Web Semant..

[6]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[7]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[8]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[9]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[10]  Damian Smedley,et al.  Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases , 2014, Bioinform..

[11]  Predrag Radivojac,et al.  Information-theoretic evaluation of predicted ontological annotations , 2013, Bioinform..

[12]  Damian Smedley,et al.  Improved exome prioritization of disease genes through cross-species phenotype comparison , 2014, Genome research.

[13]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[14]  Sean Bechhofer,et al.  Igniting the OWL 1.1 Touch Paper: The OWL API , 2007, OWLED.

[15]  Olivier Bodenreider,et al.  The digital revolution in phenotyping , 2015, Briefings Bioinform..

[16]  Jens Lehmann,et al.  The KEEN Universe - An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability , 2019, SEMWEB.

[17]  Paul W. Sternberg,et al.  Worm Phenotype Ontology: Integrating phenotype data within and beyond the C. elegans community , 2011, BMC Bioinformatics.

[18]  John M. Hancock,et al.  Building Mouse Phenotype Ontologies , 2003, Pacific Symposium on Biocomputing.

[19]  S. Mundlos,et al.  The Human Phenotype Ontology , 2010, Clinical genetics.

[20]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[21]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[22]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[23]  Bijan Parsia,et al.  Pellet: An OWL DL Reasoner , 2004, Description Logics.

[24]  Hao Wang,et al.  Ontology-Based Deep Restricted Boltzmann Machine , 2016, DEXA.

[25]  Maxat Kulmanov,et al.  EL Embeddings: Geometric construction of models for the Description Logic EL ++ , 2019, IJCAI.

[26]  R. Sharan,et al.  PREDICT: a method for inferring novel drug indications with application to personalized medicine , 2011, Molecular systems biology.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  R GruberThomas Toward principles for the design of ontologies used for knowledge sharing , 1995 .

[29]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[30]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[31]  Afshan Srikumar,et al.  Drug Target Identification , 2012 .

[32]  Paul N. Schofield,et al.  The anatomy of phenotype ontologies: principles, properties and applications , 2017, Briefings Bioinform..

[33]  Ganggao Zhu,et al.  Computing Semantic Similarity of Concepts in Knowledge Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[34]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[35]  Richard Evans,et al.  Learning Explanatory Rules from Noisy Data , 2017, J. Artif. Intell. Res..

[36]  Asa Ben-Hur,et al.  Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method , 2010, J. Bioinform. Comput. Biol..

[37]  Laura M. Jackson,et al.  Finding Our Way through Phenotypes , 2015, PLoS biology.

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[39]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[40]  Jingpu Zhang,et al.  DeepMiR2GO: Inferring Functions of Human MicroRNAs Using a Deep Multi-Label Classification Model , 2019, International journal of molecular sciences.

[41]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[42]  Damian Smedley,et al.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases , 2013, Database J. Biol. Databases Curation.

[43]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[44]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[45]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[46]  Chuang Gan,et al.  The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.

[47]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[48]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[49]  Ping Fu,et al.  A hierarchical multi-label classification method based on neural networks for gene function prediction , 2018, Biotechnology & Biotechnological Equipment.

[50]  Giorgio Valentini,et al.  Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods , 2017, BMC Bioinformatics.

[51]  Jure Leskovec,et al.  Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings , 2020, ICLR.

[52]  Luc De Raedt,et al.  Neural-Symbolic Learning and Reasoning: Contributions and Challenges , 2015, AAAI Spring Symposia.

[53]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[54]  Paul N. Schofield,et al.  Large-Scale Reasoning over Functions in Biomedical Ontologies , 2016, FOIS.

[55]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[56]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[57]  Shu-Bo Zhang,et al.  Protein-protein interaction inference based on semantic similarity of Gene Ontology terms. , 2016, Journal of theoretical biology.

[58]  S. Havlin,et al.  Diffusion and Reactions in Fractals and Disordered Systems , 2000 .

[59]  Akira R. Kinjo,et al.  Neuro-symbolic representation learning on biological knowledge graphs , 2016, Bioinform..

[60]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[61]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[62]  Maxat Kulmanov,et al.  Evaluating the effect of annotation size on measures of semantic similarity , 2017, Journal of Biomedical Semantics.

[63]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[64]  Kai-Uwe Kühnberger,et al.  Neural-Symbolic Learning and Reasoning: A Survey and Interpretation , 2017, Neuro-Symbolic Artificial Intelligence.

[65]  Michael Gribskov,et al.  Encyclopedia of bioinformatics and computational biology , 2019 .

[66]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.

[67]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[68]  Mario Albrecht,et al.  FunSimMat update: new features for exploring functional similarity , 2009, Nucleic Acids Res..

[69]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[70]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[71]  Sylvie Ranwez,et al.  The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies , 2014, Bioinform..

[72]  Karin M. Verspoor,et al.  PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources , 2015, F1000Research.

[73]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[74]  Stefan Schulz,et al.  Ontological interpretation of biomedical database content , 2017, J. Biomed. Semant..

[75]  Edward A. Feigenbaum,et al.  The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering , 1977, IJCAI.

[76]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[77]  Sylvie Ranwez,et al.  Semantic Similarity from Natural Language and Ontology Analysis , 2015, Synthesis Lectures on Human Language Technologies.

[78]  Xin Gao,et al.  OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction , 2018, Bioinform..

[79]  Xin Gao,et al.  Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations , 2018, Bioinform..

[80]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[81]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[82]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[83]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[84]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[85]  Boris Motik,et al.  Hypertableau Reasoning for Description Logics , 2009, J. Artif. Intell. Res..

[86]  Miguel Ángel Rodríguez-García,et al.  Inferring ontology graph structures using OWL reasoning , 2017, BMC Bioinformatics.

[87]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[88]  Jiaoyan Chen,et al.  Embedding OWL Ontologies with OWL2Vec , 2019, ISWC Satellites.

[89]  Edward A. Felgenbaum The art of artificial intelligence: themes and case studies of knowledge engineering , 1977, IJCAI 1977.

[90]  Richard Evans,et al.  Can Neural Networks Understand Logical Entailment? , 2018, ICLR.

[91]  Markus Krötzsch,et al.  The Incredible ELK , 2013, Journal of Automated Reasoning.

[92]  Volkan Atalay,et al.  DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks , 2019, Scientific Reports.