The GA4GH Phenopacket schema: A computable representation of clinical data for precision medicine

Despite great strides in the development and wide acceptance of standards for exchanging structured information about genomic variants, there is no corresponding standard for exchanging phenotypic data, and this has impeded the sharing of phenotypic information for computational analysis. Here, we introduce the Global Alliance for Genomics and Health (GA4GH) Phenopacket schema, which supports exchange of computable longitudinal case-level phenotypic information for diagnosis and research of all types of disease including Mendelian and complex genetic diseases, cancer, and infectious diseases. To support translational research, diagnostics, and personalized healthcare, phenopackets are designed to be used across a comprehensive landscape of applications including biobanks, databases and registries, clinical information systems such as Electronic Health Records, genomic matchmaking, diagnostic laboratories, and computational tools. The Phenopacket schema is a freely available, community-driven standard that streamlines exchange and systematic use of phenotypic data and will facilitate sophisticated computational analysis of both clinical and genomic information to help improve our understanding of diseases and our ability to manage them.

[1]  Mark D. Wilkinson,et al.  Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data , 2021, medRxiv.

[2]  R. Dahlstrom,et al.  Challenges and opportunities , 2021, Foundations of a Sustainable Economy.

[3]  Pieter B. T. Neerincx,et al.  Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases , 2021, European Journal of Human Genetics.

[4]  Gregory P. Way,et al.  Sharing biological data: why, when, and how , 2021, FEBS letters.

[5]  H. Fröhlich,et al.  CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph , 2021, medRxiv.

[6]  Christopher G Chute,et al.  The Human Phenotype Ontology in 2021 , 2020, Nucleic Acids Res..

[7]  Susan Tweedie,et al.  Genenames.org: the HGNC and VGNC resources in 2021 , 2020, Nucleic Acids Res..

[8]  A. Haynes,et al.  Digital Phenotyping and Patient-Generated Health Data for Outcome Measurement in Surgical Care: A Scoping Review , 2020, Journal of personalized medicine.

[9]  Robert S. Miller,et al.  Improving Cancer Data Interoperability: The Promise of the Minimal Common Oncology Data Elements (mCODE) Initiative. , 2020, JCO clinical cancer informatics.

[10]  E. van Enckevort,et al.  The case for open science: rare diseases. , 2020, JAMIA open.

[11]  A. Raaijmakers,et al.  Amniotic fluid peptides predict postnatal kidney survival in developmental kidney disease. , 2020, Kidney international.

[12]  S. Anwar,et al.  Golodirsen for Duchenne muscular dystrophy. , 2020, Drugs of today.

[13]  N. Queralt-Rosinach,et al.  A proof-of-concept study of extracting patient histories for rare/intractable diseases from social media , 2020, Genomics & informatics.

[14]  George Hripcsak,et al.  Deep phenotyping: Embracing complexity and temporality—Towards scalability, portability, and interoperability , 2020, Journal of Biomedical Informatics.

[15]  Damian Smedley,et al.  Interpretable Clinical Genomics with a Likelihood Ratio Paradigm. , 2020, American journal of human genetics.

[16]  Luiz Olavo Bonino da Silva Santos,et al.  Distributed Analytics on Sensitive Medical Data: The Personal Health Train , 2020, Data Intelligence.

[17]  Chunhua Weng,et al.  Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases , 2019, bioRxiv.

[18]  Tudor I. Oprea,et al.  How many rare diseases are there? , 2019, Nature Reviews Drug Discovery.

[19]  Giovanni Stilo,et al.  The social phenotype: Extracting a patient-centered perspective of diabetes from health-related blogs , 2019, Artif. Intell. Medicine.

[20]  J. Sebat,et al.  Targeted Treatment of Individuals With Psychosis Carrying a Copy Number Variant Containing a Genomic Triplication of the Glycine Decarboxylase Gene , 2019, Biological Psychiatry.

[21]  A. Olry,et al.  Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database , 2019, European Journal of Human Genetics.

[22]  Mathias W Pletz,et al.  One step closer to precision medicine for infectious diseases. , 2019, The Lancet. Infectious diseases.

[23]  Lincoln D Stein,et al.  The International Cancer Genome Consortium Data Portal , 2019, Nature Biotechnology.

[24]  David Haussler,et al.  Federated discovery and sharing of genomic data using Beacons , 2019, Nature Biotechnology.

[25]  Lon Phan,et al.  SPDI: Data Model for Variants and Applications at NCBI , 2019, bioRxiv.

[26]  S. Pendergrass,et al.  Using Electronic Health Records To Generate Phenotypes For Research , 2018, Current protocols in human genetics.

[27]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[28]  Helen E. Parkinson,et al.  BioSamples database: an updated sample metadata hub , 2018, Nucleic Acids Res..

[29]  Barcelona Institute of Science and Technology , 2018, The Grants Register 2019.

[30]  Danielle R Azzariti,et al.  ClinGen's GenomeConnect registry enables patient‐centered data sharing , 2018, Human mutation.

[31]  Christopher G Chute,et al.  Classification, Ontology, and Precision Medicine. , 2018, The New England journal of medicine.

[32]  Peter N. Robinson,et al.  A Census of Disease Ontologies , 2018, Annual Review of Biomedical Data Science.

[33]  Roland Schmitz,et al.  Genetics and Pathogenesis of Diffuse Large B‐Cell Lymphoma , 2018, The New England journal of medicine.

[34]  Rachel Thompson,et al.  RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases , 2018, European Journal of Human Genetics.

[35]  Stephen H. Bell,et al.  A ?scoping review. , 2018, Sexual health.

[36]  Stephanie Halford,et al.  Phenopolis: an open platform for harmonization and analysis of genetic and phenotypic data , 2017, Bioinform..

[37]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[38]  Euan A Ashley,et al.  The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. , 2017, American journal of human genetics.

[39]  Johan T. den Dunnen,et al.  Describing Sequence Variants Using HGVS Nomenclature. , 2017, Methods in molecular biology.

[40]  Johan T den Dunnen,et al.  Describing Sequence Variants Using HGVS Nomenclature. , 2017, Methods in molecular biology.

[41]  J. Beckmann,et al.  Reconciling evidence-based medicine and precision medicine in the era of big data: challenges and opportunities , 2016, Genome Medicine.

[42]  Giorgio Valentini,et al.  A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. , 2016, American journal of human genetics.

[43]  Jimeng Sun,et al.  Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods , 2016, Artif. Intell. Medicine.

[44]  Rachel G Liao,et al.  A federated ecosystem for sharing genomic, clinical data , 2016, Science.

[45]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[46]  Orion J. Buske,et al.  The Genomic Birthday Paradox: How Much Is Enough? , 2015, Human mutation.

[47]  Erin Rooney Riggs,et al.  GenomeConnect: Matchmaking Between Patients, Clinical Laboratories, and Researchers to Improve Genomic Knowledge , 2015, Human mutation.

[48]  Michael Brudno,et al.  PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients with Rare Genetic Diseases , 2015, Human mutation.

[49]  Vivien Marx,et al.  The DNA of a nation , 2015, Nature.

[50]  Arcadi Navarro,et al.  The European Genome-phenome Archive of human data consented for biomedical research , 2015, Nature Genetics.

[51]  J. Denny,et al.  Extracting research-quality phenotypes from electronic health records to support precision medicine , 2015, Genome Medicine.

[52]  Alejandro Sifrim,et al.  Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data , 2015, The Lancet.

[53]  William W. Stead,et al.  Toward a science of learning systems: a research agenda for the high-functioning Learning Health System , 2014, J. Am. Medical Informatics Assoc..

[54]  David R Adams,et al.  The National Institutes of Health undiagnosed diseases program , 2014, Current opinion in pediatrics.

[55]  P. Robinson,et al.  RD-Connect: An Integrated Platform Connecting Databases, Registries, Biobanks and Clinical Bioinformatics for Rare Disease Research , 2014, Journal of General Internal Medicine.

[56]  Steven J. M. Jones,et al.  FORGE Canada Consortium: outcomes of a 2-year national rare-disease gene-discovery project. , 2014, American journal of human genetics.

[57]  Damian Smedley,et al.  Improved exome prioritization of disease genes through cross-species phenotype comparison , 2014, Genome research.

[58]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[59]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[60]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[61]  Caroline F. Wright,et al.  DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation , 2013, Nucleic Acids Res..

[62]  Michael Brudno,et al.  PhenoTips: Patient Phenotyping Software for Clinical and Research Use , 2013, Human mutation.

[63]  Kamran Sartipi,et al.  HL7 FHIR: An Agile and RESTful approach to healthcare information exchange , 2013, Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.

[64]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[65]  Peter N. Robinson,et al.  Deep phenotyping for precision medicine , 2012, Human mutation.

[66]  H. Firth,et al.  The Deciphering Developmental Disorders (DDD) study , 2011, Developmental medicine and child neurology.

[67]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[68]  D. Blumenthal,et al.  Achieving a Nationwide Learning Health System , 2010, Science Translational Medicine.

[69]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[70]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[71]  K. Eriksen Why, What, and How? , 2009 .

[72]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[73]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[74]  B. D. de Vries,et al.  European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations (ECARUCA); an online database for rare chromosome abnormalities. , 2006, European journal of medical genetics.