The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species

The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype–phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype–phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.

[1]  Judith A. Blake,et al.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon , 2014, Journal of Biomedical Semantics.

[2]  Chris Mungall,et al.  What's in a Genotype?: An Ontological Characterization for Integration of Genetic Variation Data , 2013, ICBO.

[3]  Francesca Forzano,et al.  A specific mutation in TBL1XR1 causes Pierpont syndrome , 2016, Journal of Medical Genetics.

[4]  Kriston L. McGary,et al.  Systematic discovery of nonobvious human disease models through orthologous phenotypes , 2010, Proceedings of the National Academy of Sciences.

[5]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[6]  Yvonne M. Bradford,et al.  ZFIN, The zebrafish model organism database: Updates and new directions , 2015, Genesis.

[7]  Chris Mungall,et al.  Dead simple OWL design patterns , 2017, J. Biomed. Semant..

[8]  Melissa Haendel,et al.  SEPIO: A Semantic Model for the Integration and Analysis of Scientific Evidence , 2016, ICBO/BioCreative.

[9]  Peter N. Robinson,et al.  The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease , 2015, American journal of human genetics.

[10]  Damian Smedley,et al.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases , 2013, Database J. Biol. Databases Curation.

[11]  F. Dhombres,et al.  Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users , 2012, Human mutation.

[12]  Peter N. Robinson,et al.  A Census of Disease Ontologies , 2018, Annual Review of Biomedical Data Science.

[13]  Valérie Lanneau,et al.  Clinical Practice Guidelines for Rare Diseases: The Orphanet Database , 2017, PloS one.

[14]  Matthew D. Mailman,et al.  OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI , 2005, Nucleic Acids Res..

[15]  Tudor Groza,et al.  Plain-language medical vocabulary for precision diagnosis , 2018, Nature Genetics.

[16]  Janan T. Eppig,et al.  Allele, phenotype and disease data at Mouse Genome Informatics: improving access and analysis , 2015, Mammalian Genome.

[17]  Kimberly Van Auken,et al.  WormBase 2016: expanding to enable helminth genomic research , 2015, Nucleic Acids Res..

[18]  Andrea Komljenovic,et al.  BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests , 2016, F1000Research.

[19]  María Martín,et al.  10 Simple rules for design, provision, and reuse of identifiers for web-based life science data , 2015 .

[20]  Terry F. Hayamizu,et al.  Mouse anatomy ontologies: enhancements and tools for exploring and integrating biomedical data , 2015, Mammalian Genome.

[21]  Michael Brudno,et al.  PhenoTips: Patient Phenotyping Software for Clinical and Research Use , 2013, Human mutation.

[22]  Judith A. Blake,et al.  Mouse Genome Informatics (MGI): reflecting on 25 years , 2015, Mammalian Genome.

[23]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[24]  Pjotr Prins,et al.  GeneNetwork: A Toolbox for Systems Genetics. , 2017, Methods in molecular biology.

[25]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[26]  Damian Smedley,et al.  The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation , 2018, Conservation Genetics.

[27]  Lawrence Hunter,et al.  KaBOB: ontology-based semantic integration of biomedical databases , 2015, BMC Bioinformatics.

[28]  Troy J. Pells,et al.  Navigating Xenbase: An Integrated Xenopus Genomics and Gene Expression Database. , 2018, Methods in molecular biology.

[29]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[30]  Christopher J. Mungall,et al.  k-BOOM: A Bayesian approach to ontology structure inference, with applications in disease ontology construction , 2016, bioRxiv.

[31]  Alejandro Sifrim,et al.  Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data , 2015, The Lancet.

[32]  Karen Eilbeck,et al.  Evolution of the Sequence Ontology terms and relationships , 2009, J. Biomed. Informatics.

[33]  Carol J. Bult,et al.  Mouse Phenome Database , 2013, Nucleic Acids Res..

[34]  Sébastien Moretti,et al.  Bgee: Integrating and Comparing Heterogeneous Transcriptome Data Among Species , 2008, DILS.

[35]  Sergio Contrino,et al.  InterMine: extensive web services for modern biology , 2014, Nucleic Acids Res..

[36]  Christine G. Elsik,et al.  Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine , 2015, Nucleic Acids Res..

[37]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[38]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[39]  Giulia Antonazzo,et al.  FlyBase: establishing a Gene Group resource for Drosophila melanogaster , 2015, Nucleic Acids Res..

[40]  Alan Ruttenberg,et al.  The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability , 2016, J. Biomed. Semant..

[41]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[42]  R. Strohman,et al.  Maneuvering in the Complex Path from Genotype to Phenotype , 2002, Science.

[43]  David Osumi-Sutherland,et al.  The Drosophila phenotype ontology , 2013, J. Biomed. Semant..

[44]  Janan T. Eppig,et al.  Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens , 2015, Journal of Biomedical Semantics.

[45]  Sridhar Ramachandran,et al.  Using ZFIN: Data Types, Organization, and Retrieval. , 2018, Methods in molecular biology.

[46]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[47]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[48]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[49]  Sergio Contrino,et al.  Cross‐organism analysis using InterMine , 2015, Genesis.

[50]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[51]  Victoria Petri,et al.  A Primer for the Rat Genome Database (RGD). , 2018, Methods in molecular biology.

[52]  Bryan Laraway,et al.  Comparative analysis of semantic similarity and gene orthology tools for identification of gene candidates for human diseases , 2015 .

[53]  Monte Westerfield,et al.  The zebrafish anatomy and stage ontologies: representing the anatomy and development of Danio rerio , 2014, Journal of Biomedical Semantics.

[54]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[55]  Rachael P. Huntley,et al.  Standardized description of scientific evidence using the Evidence Ontology (ECO) , 2014, Database J. Biol. Databases Curation.

[56]  Giorgio Valentini,et al.  A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. , 2016, American journal of human genetics.

[57]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[58]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[59]  Edward M. Marcotte,et al.  Prediction of gene–phenotype associations in humans, mice, and plants using phenologs , 2013, BMC Bioinformatics.

[60]  Damian Smedley,et al.  Improved exome prioritization of disease genes through cross-species phenotype comparison , 2014, Genome research.

[61]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[62]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2017 , 2016, Nucleic Acids Res..

[63]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[64]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[65]  Damian Smedley,et al.  matchbox: An open‐source tool for patient matching via the Matchmaker Exchange , 2018, Human mutation.

[66]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[67]  Monte Westerfield,et al.  ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics , 2012, Nucleic Acids Res..

[68]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[69]  Obi L. Griffith,et al.  High-performance web services for querying gene and variant annotation , 2016, Genome Biology.

[70]  Elissa J. Chesler,et al.  Mouse Phenome Database: an integrative database and analysis suite for curated empirical phenotype data from laboratory mice , 2017, Nucleic Acids Res..

[71]  Damian Smedley,et al.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. , 2013, F1000Research.

[72]  Damian Smedley,et al.  The 100 000 Genomes Project: bringing whole genome sequencing to the NHS , 2018, British Medical Journal.

[73]  Heidi L Rehm,et al.  ClinGen--the Clinical Genome Resource. , 2015, The New England journal of medicine.

[74]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[75]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[76]  Michel Dumontier,et al.  FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation , 2014, Journal of Biomedical Semantics.

[77]  Damian Smedley,et al.  Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome , 2014, Science Translational Medicine.

[78]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[79]  Olivia W Lang,et al.  An Introduction to the Saccharomyces Genome Database (SGD). , 2018, Methods in molecular biology.

[80]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[81]  Melissa J Landrum,et al.  ClinVar at five years: Delivering on the promise , 2018, Human mutation.

[82]  S. Omholt,et al.  Phenomics: the next challenge , 2010, Nature Reviews Genetics.

[83]  Tudor Groza,et al.  Navigating the Phenotype Frontier: The Monarch Initiative , 2016, Genetics.

[84]  Michel Dumontier,et al.  Bio2RDF Release 3: A larger, more connected network of Linked Data for the Life Sciences , 2014, SEMWEB.

[85]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[86]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[87]  Damian Smedley,et al.  Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency , 2015, Genetics in Medicine.

[88]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[89]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[90]  Damian Smedley,et al.  Defining Disease, Diagnosis, and Translational Medicine within a Homeostatic Perturbation Paradigm: The National Institutes of Health Undiagnosed Diseases Program Experience , 2017, Front. Med..

[91]  S. Lewis,et al.  Use of Model Organism and Disease Databases to Support Matchmaking for Human Disease Gene Discovery , 2015, Human mutation.

[92]  Jessica A. Turner,et al.  The Ontology for Biomedical Investigations , 2016, PloS one.

[93]  Janan T. Eppig,et al.  The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data , 2012, Mammalian Genome.

[94]  Nigel Collier,et al.  Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora , 2015, Database J. Biol. Databases Curation.

[95]  Kimberly Van Auken,et al.  WormBase 2017: molting into a new stage , 2017, Nucleic Acids Res..

[96]  Euan A Ashley,et al.  The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. , 2017, American journal of human genetics.

[97]  Gilberto Fragoso,et al.  The mouse–human anatomy ontology mapping project , 2012, Database J. Biol. Databases Curation.