Ontology-based validation and identification of regulatory phenotypes

Motivation Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations. Results We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. Our method can also be applied to the rule-based prediction of phenotypes from functions. We show that the predicted phenotypes can be utilized for identification of protein-protein interactions and gene-disease associations. Based on experimental functional annotations, we predict phenotypes for 1,986 genes in mouse and 7,301 genes in human for which no experimental phenotypes have yet been determined. Availability https://github.com/bio-ontology-research-group/phenogocon Contact robert.hoehndorf@kaust.edu.sa

[1]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[2]  Miguel Ángel Rodríguez-García,et al.  Integrating phenotype ontologies with PhenomeNET , 2016, OM@ISWC.

[3]  George Haughn,et al.  Reverse genetics techniques: engineering loss and gain of gene function in plants. , 2010, Briefings in functional genomics.

[4]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[5]  Susan Tweedie,et al.  Genenames.org: the HGNC and VGNC resources in 2017 , 2016, Nucleic Acids Res..

[6]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[7]  Paul N. Schofield,et al.  Large-Scale Reasoning over Functions in Biomedical Ontologies , 2016, FOIS.

[8]  M. H. Angelis,et al.  Towards better mouse models: enhanced genotypes, systemic phenotyping and envirotype modelling , 2009, Nature Reviews Genetics.

[9]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  Paul N. Schofield,et al.  Aber-OWL: a framework for ontology-based data access in biology , 2014, BMC Bioinformatics.

[12]  D. Schlessinger,et al.  Overgrowth of a mouse model of the Simpson-Golabi-Behmel syndrome is independent of IGF signaling. , 2002, Developmental biology.

[13]  John P Sundberg,et al.  Show and tell: disclosure and data sharing in experimental pathology , 2016, Disease Models & Mechanisms.

[14]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[15]  Brett J. Kennedy,et al.  Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. , 2014, American journal of human genetics.

[16]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[17]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[18]  Paul N. Schofield,et al.  The anatomy of phenotype ontologies: principles, properties and applications , 2017, Briefings Bioinform..

[19]  W. Johannsen Elemente der exakten Erblichkeitslehre: Deutsche wesentlich erweiterte Ausgabe in fünfundzwanzig Vorlesungen , 1909 .

[20]  Tao Huang,et al.  Predicting Protein Phenotypes Based on Protein-Protein Interaction Network , 2011, PloS one.

[21]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[22]  Vladimir B. Bajic,et al.  Semantic prioritization of novel causative genomic variants , 2017, PLoS Comput. Biol..

[23]  Olivier Bodenreider,et al.  Non-Lexical Approaches to Identifying Associative Relations in the Gene Ontology , 2004, Pacific Symposium on Biocomputing.

[24]  H. D. Liggitt,et al.  Disruption of Fnip1 reveals a metabolic checkpoint controlling B lymphocyte development. , 2012, Immunity.

[25]  John M. Hancock,et al.  Using ontologies to describe mouse phenotypes , 2004, Genome Biology.

[26]  Nigel W. Hardy,et al.  Mouse model phenotypes provide information about human drug targets , 2013, Bioinform..

[27]  Markus Krötzsch,et al.  ELK Reasoner: Architecture and Evaluation , 2012, ORE.

[28]  Jens Lehmann,et al.  DL-Learner: Learning Concepts in Description Logics , 2009, J. Mach. Learn. Res..

[29]  Julius O. B. Jacobsen,et al.  A mouse informatics platform for phenotypic and translational discovery , 2015, Mammalian Genome.

[30]  Dana C. Crawford,et al.  The detection and characterization of pleiotropy: discovery, progress, and promise , 2016, Briefings Bioinform..

[31]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[32]  Janan T. Eppig,et al.  Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens , 2015, Journal of Biomedical Semantics.

[33]  Matti Pietikäinen,et al.  Large-Scale Evaluation , 2009, Encyclopedia of Biometrics.

[34]  W. Vach,et al.  A non-parametric approach for identifying differentially expressed genes in factorial microarray experiments , 2005, Genome Biology.

[35]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[36]  Susan Tweedie,et al.  Genenames.org: the HGNC and VGNC resources in 2021 , 2020, Nucleic Acids Res..

[37]  Weidong Tian,et al.  GoFDR: A sequence alignment based method for predicting protein functions. , 2016, Methods.

[38]  Henrik Westerberg,et al.  Analysis of mammalian gene function through broad based phenotypic screens across a consortium of mouse clinics , 2015, Nature Genetics.

[39]  John G. Moffat,et al.  Phenotypic screening in cancer drug discovery — past, present and future , 2014, Nature Reviews Drug Discovery.

[40]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[41]  B. A.,et al.  Disease model discovery from 3,328 gene knockouts by The International Mouse Phenotyping Consortium , 2018, Yearbook of Paediatric Endocrinology.

[42]  W. Johannsen,et al.  The Genotype Conception of Heredity , 1911, The American Naturalist.

[43]  Karin M. Verspoor,et al.  PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources , 2015, F1000Research.

[44]  Hannah Currant,et al.  FFPred 3: feature-based function prediction for all Gene Ontology domains , 2016, Scientific Reports.

[45]  Christopher P Austin,et al.  The Knockout Mouse Project , 2004, Nature Genetics.

[46]  Nigel W. Hardy,et al.  Systematic Analysis of Experimental Phenotype Data Reveals Gene Functions , 2013, PloS one.

[47]  G. Szot,et al.  Costimulation controls diabetes by altering the balance of pathogenic and regulatory T cells. , 2004, The Journal of clinical investigation.

[48]  Carol A. Bocchini,et al.  A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) , 2011, Human mutation.

[49]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[50]  H. M. Vernon Elemente der exakten Erblichkeitslehre , 1909, Nature.

[51]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[52]  Sylvie Ranwez,et al.  The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies , 2014, Bioinform..

[53]  Karin M. Verspoor,et al.  PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources , 2015, F1000Research.

[54]  Stefan Schulz,et al.  Ontological interpretation of biomedical database content , 2017, J. Biomed. Semant..

[55]  Boris Motik,et al.  OWL 2: The next step for OWL , 2008, J. Web Semant..

[56]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[57]  Adam P. Rosebrock,et al.  A global genetic interaction network maps a wiring diagram of cellular function , 2016, Science.