The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species

Abstract In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven’t been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.

Tudor Groza | Damian Smedley | Simon Jupp | David Osumi-Sutherland | Julius O. B. Jacobsen | Sebastian Köhler | Nicolas Matentzoglu | Monica C Munoz-Torres | Xingmin Aaron Zhang | Seth Carbon | Matthew H. Brush | Deepak Unni | Erik Segerdell | Suzanna E Lewis | Valentina Cipriani | Julie A McMurry | Anne Thessen | Nathan Dunn | Julius O B Jacobsen | Petra Fey | Matthew Brush | Daniel Keith | Nomi L Harris | Paola Roncaglia | Tom Conlin | Shahim Essaid | Vida Ravanmehr | Nicole Vasilevsky | Hannah Blau | Clare Pilgrim | Christian A. Grove | Ada Hamosh | James Seager | Alayne Cuzick | Michael A. Gargano | Sofia M. C. Robb | Justin Reese | Morgan Similuk | Susan M Bello | Craig McNamara | James P Balhoff | Leigh Carmody | Ingo Helbig | Sofia Robb | Tim Putman | Melissa A Haendel | Christopher J Mungall | Michael Gargano | Peter N Robinson | M. Harris | S. Lewis | D. Smedley | S. Köhler | C. Mungall | T. Groza | S. Carbon | Hannah Blau | A. Hamosh | P. Fey | N. Harris | P. Robinson | M. Haendel | J. Reese | C. Grove | I. Helbig | E. Riggs | T. Putman | S. Jupp | N. Dunn | D. Osumi-Sutherland | P. Roncaglia | J. Balhoff | N. Vasilevsky | Y. Bradford | J. Gourdine | J. McMurry | Tom Conlin | Daniel Keith | Kent A. Shefchek | V. Cipriani | S. Robb | M. Munoz-Torres | A. Thessen | Maria G Della Rocca | D. Unni | Courtney Thaxon | M. Hoatlin | A. Cuzick | M. Similuk | Maureen Hoatlin | S. Bello | Midori Harris | Kent A Shefchek | Larry Babb | Yvonne Bradford | Lauren E Chan | Maria Della Rocca | Chris Grove | Jean-Phillipe Gourdine | Marcin Joachimiak | Kenneth B Lett | Zoë M Pendlington | Erin Riggs | Andrea L Storm | Courtney Thaxon | James Seager | V. Ravanmehr | Leigh Carmody | Shahim Essaid | N. Matentzoglu | L. Chan | X. A. Zhang | M. Gargano | Craig McNamara | A. L. Storm | marcin p. joachimiak | C. Pilgrim | E. Segerdell | Morgan N. Similuk | Z. Pendlington | L. Babb | H. Blau | Timothy Putman | Sebastian Köhler | S. Lewis

[1]  Francesca Forzano,et al.  A specific mutation in TBL1XR1 causes Pierpont syndrome , 2016, Journal of Medical Genetics.

[2]  Sridhar Ramachandran,et al.  Using ZFIN: Data Types, Organization, and Retrieval. , 2018, Methods in molecular biology.

[3]  Christopher J. Mungall,et al.  k-BOOM: A Bayesian approach to ontology structure inference, with applications in disease ontology construction , 2016, bioRxiv.

[4]  Damian Smedley,et al.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. , 2013, F1000Research.

[5]  Peter N. Robinson,et al.  A Census of Disease Ontologies , 2018, Annual Review of Biomedical Data Science.

[6]  Robin Champieux,et al.  An analysis and metric of reusable data licensing practices for biomedical resources , 2019, PloS one.

[7]  Daniel J. Vreeman,et al.  Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery , 2019, bioRxiv.

[8]  Thawfeek M. Varusai,et al.  The Reactome Pathway Knowledgebase , 2017, Nucleic acids research.

[9]  Melissa Haendel,et al.  Phenotype Ontologies Traversing All The Organisms (POTATO) workshop aims to reconcile logical definitions across species. Workshop Report , 2018 .

[10]  Victoria Petri,et al.  A Primer for the Rat Genome Database (RGD). , 2018, Methods in molecular biology.

[11]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[12]  Andrea Komljenovic,et al.  BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests , 2016, F1000Research.

[13]  Janan T. Eppig,et al.  The Mammalian Phenotype Ontology as a unifying standard for experimental and high-throughput phenotyping data , 2012, Mammalian Genome.

[14]  David Osumi-Sutherland,et al.  The Drosophila phenotype ontology , 2013, J. Biomed. Semant..

[15]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[16]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[17]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2019 , 2018, Nucleic Acids Res..

[18]  Alan Ruttenberg,et al.  The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability , 2016, J. Biomed. Semant..

[19]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[20]  Alan F. Scott,et al.  OMIM.org: leveraging knowledge across phenotype–gene relationships , 2018, Nucleic Acids Res..

[21]  J. Biegel,et al.  A semiautomated whole-exome sequencing workflow leads to increased diagnostic yield and identification of novel candidate variants , 2019, Cold Spring Harbor molecular case studies.

[22]  Troy J. Pells,et al.  Navigating Xenbase: An Integrated Xenopus Genomics and Gene Expression Database. , 2018, Methods in molecular biology.

[23]  Tudor Groza,et al.  The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2016, bioRxiv.

[24]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[25]  Olivia W Lang,et al.  An Introduction to the Saccharomyces Genome Database (SGD). , 2018, Methods in molecular biology.

[26]  Damian Smedley,et al.  matchbox: An open‐source tool for patient matching via the Matchmaker Exchange , 2018, Human mutation.

[27]  Valérie Lanneau,et al.  Clinical Practice Guidelines for Rare Diseases: The Orphanet Database , 2017, PloS one.

[28]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[29]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[30]  Matthew D. Mailman,et al.  OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI , 2005, Nucleic Acids Res..

[31]  Tudor Groza,et al.  Plain-language medical vocabulary for precision diagnosis , 2018, Nature Genetics.

[32]  Monte Westerfield,et al.  The zebrafish anatomy and stage ontologies: representing the anatomy and development of Danio rerio , 2014, Journal of Biomedical Semantics.

[33]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[34]  Chris Mungall,et al.  Dead simple OWL design patterns , 2017, J. Biomed. Semant..

[35]  Elissa J. Chesler,et al.  Mouse Phenome Database: an integrative database and analysis suite for curated empirical phenotype data from laboratory mice , 2017, Nucleic Acids Res..

[36]  Damian Smedley,et al.  Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome , 2014, Science Translational Medicine.

[37]  Giulia Antonazzo,et al.  FlyBase 2.0: the next generation , 2018, Nucleic Acids Res..

[38]  Judith A. Blake,et al.  Mouse Genome Database (MGD) 2019 , 2018, Nucleic Acids Res..

[39]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[40]  Damian Smedley,et al.  The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation , 2018, Conservation Genetics.

[41]  Kimberly Van Auken,et al.  WormBase 2017: molting into a new stage , 2017, Nucleic Acids Res..

[42]  Heidi L Rehm,et al.  ClinGen--the Clinical Genome Resource. , 2015, The New England journal of medicine.

[43]  Damian Smedley,et al.  Defining Disease, Diagnosis, and Translational Medicine within a Homeostatic Perturbation Paradigm: The National Institutes of Health Undiagnosed Diseases Program Experience , 2017, Front. Med..

[44]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[45]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[46]  Euan A Ashley,et al.  The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. , 2017, American journal of human genetics.

[47]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[48]  James M. Reecy,et al.  Building a livestock genetic and genomic information knowledgebase through integrative developments of Animal QTLdb and CorrDB , 2018, Nucleic Acids Res..

[49]  Andrea Komljenovic,et al.  BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests , 2016, F1000Research.

[50]  Melissa J Landrum,et al.  ClinVar at five years: Delivering on the promise , 2018, Human mutation.

[51]  Damian Smedley,et al.  The 100 000 Genomes Project: bringing whole genome sequencing to the NHS , 2018, British Medical Journal.

[52]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..