Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases

Abstract Human Phenotype Ontology (HPO) terms are increasingly used in diagnostic settings to aid in the characterization of patient phenotypes. The HPO annotation database is updated frequently and can provide detailed phenotype knowledge on various human diseases, and many HPO terms are now mapped to candidate causal genes with binary relationships. To further improve the genetic diagnosis of rare diseases, we incorporated these HPO annotations, gene–disease databases and gene–gene databases in a probabilistic model to build a novel HPO-driven gene prioritization tool, Phen2Gene. Phen2Gene accesses a database built upon this information called the HPO2Gene Knowledgebase (H2GKB), which provides weighted and ranked gene lists for every HPO term. Phen2Gene is then able to access the H2GKB for patient-specific lists of HPO terms or PhenoPacket descriptions supported by GA4GH (http://phenopackets.org/), calculate a prioritized gene list based on a probabilistic model and output gene–disease relationships with great accuracy. Phen2Gene outperforms existing gene prioritization tools in speed and acts as a real-time phenotype-driven gene prioritization tool to aid the clinical diagnosis of rare undiagnosed diseases. In addition to a command line tool released under the MIT license (https://github.com/WGLab/Phen2Gene), we also developed a web server and web service (https://phen2gene.wglab.org/) for running the tool via web interface or RESTful API queries. Finally, we have curated a large amount of benchmarking data for phenotype-to-gene tools involving 197 patients across 76 scientific articles and 85 patients’ de-identified HPO term data from the Children’s Hospital of Philadelphia.

[1]  Ying He,et al.  Whole-exome sequencing enables correct diagnosis and surgical management of rare inherited childhood anemia , 2018, Cold Spring Harbor molecular case studies.

[2]  Stephen F. Kingsmore,et al.  Rapid whole-genome sequencing identifies a novel AIRE variant associated with autoimmune polyendocrine syndrome type 1 , 2018, Cold Spring Harbor molecular case studies.

[3]  James T Lu,et al.  An exome sequencing study of Moebius syndrome including atypical cases reveals an individual with CFEOM3A and a TUBB3 mutation , 2017, Cold Spring Harbor molecular case studies.

[4]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[5]  Michael F. Wangler,et al.  Lessons learned from additional research analyses of unsolved clinical exome cases , 2017, Genome Medicine.

[6]  R. Pfundt,et al.  WDR26 Haploinsufficiency Causes a Recognizable Syndrome of Intellectual Disability, Seizures, Abnormal Gait, and Distinctive Facial Features. , 2017, American journal of human genetics.

[7]  Edward Yang,et al.  TAF1 Variants Are Associated with Dysmorphic Features, Intellectual Disability, and Neurological Manifestations , 2015, American journal of human genetics.

[8]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[9]  Shimul Chowdhury,et al.  Novel Factor XIII variant identified through whole-genome sequencing in a child with intracranial hemorrhage , 2018, Cold Spring Harbor molecular case studies.

[10]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Michael A. Gonzalez,et al.  Rapid and accurate interpretation of clinical exomes using Phenoxome: a computational phenotype-driven approach , 2018, European Journal of Human Genetics.

[13]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[14]  Michael J. Lush,et al.  genenames.org: the HGNC resources in 2011 , 2010, Nucleic Acids Res..

[15]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[16]  David M. Lane,et al.  Hidden Costs of Graphical User Interfaces: Failure to Make the Transition from Menus and Icon Toolbars to Keyboard Shortcuts , 2005, Int. J. Hum. Comput. Interact..

[17]  M. Farrer,et al.  De Novo Mutations in YWHAG Cause Early-Onset Epilepsy. , 2017, American journal of human genetics.

[18]  Donald P. Frush,et al.  Further evidence for the involvement of EFL1 in a Shwachman–Diamond-like syndrome and expansion of the phenotypic features , 2018, Cold Spring Harbor molecular case studies.

[19]  Brett J. Kennedy,et al.  Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. , 2014, American journal of human genetics.

[20]  Vinodh Narayanan,et al.  A de novo missense mutation in ZMYND11 is associated with global developmental delay, seizures, and hypotonia , 2016, Cold Spring Harbor molecular case studies.

[21]  Elaine R. Mardis,et al.  Resistance-promoting effects of ependymoma treatment revealed through genomic analysis of multiple recurrences in a single patient , 2018, Cold Spring Harbor molecular case studies.

[22]  Mugdha Joshi,et al.  Mutations in the substrate binding glycine-rich loop of the mitochondrial processing peptidase-α protein (PMPCA) cause a severe mitochondrial disease , 2016, Cold Spring Harbor molecular case studies.

[23]  Su Guo,et al.  A novel PRRT2 pathogenic variant in a family with paroxysmal kinesigenic dyskinesia and benign familial infantile seizures , 2018, Cold Spring Harbor molecular case studies.

[24]  Mary Shimoyama,et al.  Disease Ontology: improving and unifying disease annotations across species , 2018, Disease Models & Mechanisms.

[25]  Ronald Cohn,et al.  RAC1 Missense Mutations in Developmental Disorders with Diverse Phenotypes. , 2017, American journal of human genetics.

[26]  R. Pfundt,et al.  Recurrent De Novo Mutations Disturbing the GTP/GDP Binding Pocket of RAB11B Cause Intellectual Disability and a Distinctive Brain Phenotype. , 2017, American journal of human genetics.

[27]  Yong Huang,et al.  VarSight: prioritizing clinically reported variants with binary classification algorithms , 2019, BMC Bioinformatics.

[28]  Koichi Takahashi,et al.  Hematologic malignancies and Li–Fraumeni syndrome , 2019, Cold Spring Harbor molecular case studies.

[29]  B. Fernandez,et al.  Utility of whole‐exome sequencing for those near the end of the diagnostic odyssey: time to address gaps in care , 2015, Clinical genetics.

[30]  S. Phinn,et al.  Australian vegetated coastal ecosystems as global hotspots for climate change mitigation , 2019, Nature Communications.

[31]  F. Dhombres,et al.  Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users , 2012, Human mutation.

[32]  J. Rosenfeld,et al.  De Novo Missense Mutations in DHX30 Impair Global Translation and Cause a Neurodevelopmental Disorder. , 2017, American journal of human genetics.

[33]  George Hripcsak,et al.  Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. , 2018, American journal of human genetics.

[34]  Erika Souche,et al.  SPG20 mutation in three siblings with familial hereditary spastic paraplegia , 2017, Cold Spring Harbor molecular case studies.

[35]  I. Mihalek,et al.  Utility of rapid whole-exome sequencing in the diagnosis of Niemann–Pick disease type C presenting with fetal hydrops and acute liver failure , 2017, Cold Spring Harbor molecular case studies.

[36]  David Sims,et al.  Dominant Mutations in GRM1 Cause Spinocerebellar Ataxia Type 44 , 2017, American journal of human genetics.

[37]  Emily Zimmerman,et al.  FOXP2 gene deletion and infant feeding difficulties: a case report , 2016, Cold Spring Harbor molecular case studies.

[38]  Gholson J. Lyon,et al.  SCN8A mutation in a child presenting with seizures and developmental delays , 2016, Cold Spring Harbor molecular case studies.

[39]  Alina Khromykh,et al.  Mutation in an alternative transcript of CDKL5 in a boy with early-onset seizures , 2018, Cold Spring Harbor molecular case studies.

[40]  Chunhua Weng,et al.  Doc2Hpo: a web application for efficient and accurate HPO concept curation , 2019, Nucleic Acids Res..

[41]  Daniel C. Koboldt,et al.  A de novo nonsense mutation in ASXL3 shared by siblings with Bainbridge–Ropers syndrome , 2018, Cold Spring Harbor molecular case studies.

[42]  Sanjay P. Prabhu,et al.  AIFM1 mutation presenting with fatal encephalomyopathy and mitochondrial disease in an infant , 2017, Cold Spring Harbor molecular case studies.

[43]  Rolf Schröder,et al.  Clinical exome sequencing: results from 2819 samples reflecting 1000 families , 2016, European Journal of Human Genetics.

[44]  Wyeth W. Wasserman,et al.  Optic atrophy, cataracts, lipodystrophy/lipoatrophy, and peripheral neuropathy caused by a de novo OPA3 mutation , 2017, Cold Spring Harbor molecular case studies.

[45]  P. Ng,et al.  Phen-Gen: combining phenotype and genotype to analyze rare disorders , 2014, Nature Methods.

[46]  Keyan Zhao,et al.  Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis , 2019, Genetics in Medicine.

[47]  Robert C. Green,et al.  Reconciling newborn screening and a novel splice variant in BTD associated with partial biotinidase deficiency: a BabySeq Project case report , 2018, Cold Spring Harbor molecular case studies.

[48]  Yulan Lu,et al.  Early-onset infant epileptic encephalopathy associated with a de novo PPP3CA gene mutation , 2018, Cold Spring Harbor molecular case studies.

[49]  Golder N Wilson,et al.  De novo variants in EBF3 are associated with hypotonia, developmental delay, intellectual disability, and autism , 2017, Cold Spring Harbor molecular case studies.

[50]  Daniel C. Koboldt,et al.  In-frame de novo mutation in BICD2 in two patients with muscular atrophy and arthrogryposis , 2018, Cold Spring Harbor molecular case studies.

[51]  Wendy K. Chung,et al.  De novo PHIP-predicted deleterious variants are associated with developmental delay, intellectual disability, obesity, and dysmorphic features , 2016, Cold Spring Harbor molecular case studies.

[52]  M. Acencio,et al.  HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions , 2012, BMC Genomics.

[53]  Wendy K. Chung,et al.  De novo mutations in PURA are associated with hypotonia and developmental delay , 2015, Cold Spring Harbor molecular case studies.

[54]  Whitney Whitford,et al.  Compound heterozygous SLC19A3 mutations further refine the critical promoter region for biotin-thiamine-responsive basal ganglia disease , 2017, Cold Spring Harbor molecular case studies.

[55]  Meghan C Towne,et al.  A novel de novo mutation in ATP1A3 and childhood-onset schizophrenia , 2016, Cold Spring Harbor molecular case studies.

[56]  Catherine E. Keegan,et al.  A novel FGD1 mutation in a family with Aarskog–Scott syndrome and predominant features of congenital joint contractures , 2016, Cold Spring Harbor molecular case studies.

[57]  Heidi L. Rehm,et al.  Reclassification of the BRAF p.Ile208Val variant by case-level data sharing , 2018, Cold Spring Harbor molecular case studies.

[58]  Jonathan S. Berg,et al.  Combination of exome sequencing and immune testing confirms Aicardi–Goutières syndrome type 5 in a challenging pediatric neurology case , 2018, Cold Spring Harbor molecular case studies.

[59]  Ove Juul Nielsen,et al.  Whole-exome sequencing of a patient with severe and complex hemostatic abnormalities reveals a possible contributing frameshift mutation in C3AR1 , 2016, Cold Spring Harbor molecular case studies.

[60]  Richard A Gibbs,et al.  REST Final-Exon-Truncating Mutations Cause Hereditary Gingival Fibromatosis. , 2017, American journal of human genetics.

[61]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[62]  W. Chung,et al.  Clinical application of whole-exome sequencing across clinical indications , 2015, Genetics in Medicine.

[63]  Elaine R. Mardis,et al.  Genome sequencing identifies somatic BRAF duplication c.1794_1796dupTAC;p.Thr599dup in pediatric patient with low-grade ganglioglioma , 2018, Cold Spring Harbor molecular case studies.

[64]  Chanjae Lee,et al.  Compound heterozygous alterations in intraflagellar transport protein CLUAP1 in a child with a novel Joubert and oral–facial–digital overlap syndrome , 2017, Cold Spring Harbor molecular case studies.

[65]  Eric Boerwinkle,et al.  Whole-exome sequencing reveals an inherited R566X mutation of the epithelial sodium channel β-subunit in a case of early-onset phenotype of Liddle syndrome , 2016, Cold Spring Harbor molecular case studies.

[66]  Michael P Snyder,et al.  Identification of a novel mutation in the APTX gene associated with ataxia-oculomotor apraxia , 2017, Cold Spring Harbor molecular case studies.

[67]  Kai Wang,et al.  KBG syndrome involving a single-nucleotide duplication in ANKRD11 , 2016, Cold Spring Harbor molecular case studies.

[68]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[69]  Sergey Koren,et al.  A robust benchmark for germline structural variant detection , 2019, bioRxiv.

[70]  Bart Loeys,et al.  Bi-allelic Loss-of-Function Mutations in the NPR-C Receptor Result in Enhanced Growth and Connective Tissue Abnormalities. , 2018, American journal of human genetics.

[71]  Volkan Okur,et al.  Biallelic variants in VARS in a family with two siblings with intellectual disability and microcephaly: case report and review of the literature , 2018, Cold Spring Harbor molecular case studies.

[72]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[73]  Prince Antwi,et al.  De novo MYH9 mutation in congenital scalp hemangioma , 2018, Cold Spring Harbor molecular case studies.

[74]  Nan Li,et al.  Long-read sequencing identified intronic repeat expansions in SAMD12 from Chinese pedigrees affected with familial cortical myoclonic tremor with epilepsy , 2018, Journal of Medical Genetics.

[75]  Wendy K. Chung,et al.  A newly identified mutation in the PEX26 gene is associated with a milder form of Zellweger spectrum disorder , 2018, Cold Spring Harbor molecular case studies.

[76]  Hongfang Liu,et al.  DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx , 2015, J. Biomed. Informatics.

[77]  Kevin M Bowling,et al.  Complexities of genetic diagnosis illustrated by an atypical case of congenital hypoplastic anemia , 2018, Cold Spring Harbor molecular case studies.

[78]  Robert Jech,et al.  A unique de novo gain-of-function variant in CAMK4 associated with intellectual disability and hyperkinetic movement disorder , 2018, Cold Spring Harbor molecular case studies.

[79]  Akdes Serin Harmanci,et al.  ALPK3 gene mutation in a patient with congenital cardiomyopathy and dysmorphic features , 2017, Cold Spring Harbor molecular case studies.

[80]  Prince Antwi,et al.  A novel association of campomelic dysplasia and hydrocephalus with an unbalanced chromosomal translocation upstream of SOX9 , 2018, Cold Spring Harbor molecular case studies.

[81]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[82]  Ian M. Carr,et al.  OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization , 2015, Bioinform..

[83]  Olivier Sterkers,et al.  FDXR Mutations Cause Sensorial Neuropathies and Expand the Spectrum of Mitochondrial Fe-S-Synthesis Diseases. , 2017, American journal of human genetics.

[84]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[85]  Ash A. Alizadeh,et al.  Surgical and molecular characterization of primary and metastatic disease in a neuroendocrine tumor arising in a tailgut cyst , 2018, Cold Spring Harbor molecular case studies.

[86]  T. Shaikh,et al.  Discovery of a potentially deleterious variant in TMEM87B in a patient with a hemizygous 2q13 microdeletion suggests a recessive condition characterized by congenital heart disease and restrictive cardiomyopathy , 2016, Cold Spring Harbor molecular case studies.

[87]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[88]  Charis Eng,et al.  Exome sequencing reveals germline gain-of-function EGFR mutation in an adult with Lhermitte–Duclos disease , 2016, Cold Spring Harbor molecular case studies.

[89]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[90]  A. Amorim,et al.  A Pipeline to Assess Disease-Associated Haplotypes in Repeat Expansion Disorders: The Example of MJD/SCA3 Locus , 2019, Front. Genet..

[91]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[92]  Shimul Chowdhury,et al.  The case for early use of rapid whole-genome sequencing in management of critically ill infants: late diagnosis of Coffin–Siris syndrome in an infant with left congenital diaphragmatic hernia, congenital heart disease, and recurrent infections , 2018, Cold Spring Harbor molecular case studies.

[93]  Li Ding,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2018, Nature Communications.

[94]  Susana Raimondi,et al.  Clonal dynamics of donor-derived myelodysplastic syndrome after unrelated hematopoietic cell transplantation for high-risk pediatric B-lymphoblastic leukemia , 2018, Cold Spring Harbor molecular case studies.

[95]  Jonathan A. Bernstein,et al.  WISP3 mutation associated with pseudorheumatoid dysplasia , 2018, Cold Spring Harbor molecular case studies.

[96]  Frédéric Tran Mau-Them,et al.  Homozygous Truncating Variants in TBC1D23 Cause Pontocerebellar Hypoplasia and Alter Cortical Development. , 2017, American journal of human genetics.

[97]  David R Adams,et al.  A patient with multisystem dysfunction carries a truncation mutation in human SLC12A2, the gene encoding the Na-K-2Cl cotransporter, NKCC1 , 2016, Cold Spring Harbor molecular case studies.

[98]  Thomas Meitinger,et al.  Biallelic Mutations in LIPT2 Cause a Mitochondrial Lipoylation Defect Associated with Severe Neonatal Encephalopathy. , 2017, American journal of human genetics.

[99]  Eric E. Schadt,et al.  Detection of endometrial precancer by a targeted gynecologic cancer liquid biopsy , 2018, Cold Spring Harbor molecular case studies.

[100]  Hyungwon Choi,et al.  CDK10 Mutations in Humans and Mice Cause Severe Growth Retardation, Spine Malformations, and Developmental Delays. , 2017, American journal of human genetics.

[101]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[102]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[103]  Michael T. Zimmermann,et al.  Novel NR2F1 variants likely disrupt DNA binding: molecular modeling in two cases, review of published cases, genotype–phenotype correlation, and phenotypic expansion of the Bosch–Boonstra–Schaaf optic atrophy syndrome , 2017, Cold Spring Harbor molecular case studies.

[104]  Tom Walsh,et al.  Infantile onset spinocerebellar ataxia caused by compound heterozygosity for Twinkle mutations and modeling of Twinkle mutations causing recessive disease , 2016, Cold Spring Harbor molecular case studies.

[105]  Haley J. Abel,et al.  SVScore: an impact prediction tool for structural variation , 2016, bioRxiv.

[106]  Jan Byska,et al.  A de novo Ser111Thr variant in aquaporin-4 in a patient with intellectual disability, transient signs of brain ischemia, transient cardiac hypertrophy, and progressive gait disturbance , 2018, Cold Spring Harbor molecular case studies.

[107]  Maxat Kulmanov,et al.  DeepPVP: phenotype-based prioritization of causative variants using deep learning , 2018, BMC Bioinformatics.

[108]  Birgit Sikkema-Raddatz,et al.  Improving the diagnostic yield of exome- sequencing by predicting gene–phenotype associations using large-scale gene expression analysis , 2019, Nature Communications.

[109]  Tsviya Olender,et al.  VarElect: the phenotype-based variation prioritizer of the GeneCards Suite , 2016, BMC Genomics.

[110]  Catherine Karimov,et al.  Transmission of a TP53 germline mutation from unaffected male carrier associated with pediatric glioblastoma in his child and gestational choriocarcinoma in his female partner , 2018, Cold Spring Harbor molecular case studies.

[111]  Judith A. Blake,et al.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon , 2014, Journal of Biomedical Semantics.

[112]  Tommaso Mazza,et al.  Hypomorphic Recessive Variants in SUFU Impair the Sonic Hedgehog Pathway and Cause Joubert Syndrome with Cranio-facial and Skeletal Defects. , 2017, American journal of human genetics.

[113]  Ed Reznik,et al.  Germline SDHA mutations in children and adults with cancer , 2018, Cold Spring Harbor molecular case studies.

[114]  M. Diekhans,et al.  AMELIE 2 speeds up Mendelian diagnosis by matching patient phenotype & genotype to primary literature , 2019, bioRxiv.

[115]  Gregory M. Enns,et al.  De novo truncating variants in the AHDC1 gene encoding the AT-hook DNA-binding motif-containing protein 1 are associated with intellectual disability and developmental delay , 2015, Cold Spring Harbor molecular case studies.

[116]  Allen Chi-Shing Yu,et al.  Whole-genome sequencing of two probands with hereditary spastic paraplegia reveals novel splice-donor region variant and known pathogenic variant in SPG11 , 2016, Cold Spring Harbor molecular case studies.

[117]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[118]  Craig S. Miller,et al.  Comparison of Mouse and Keyboard Efficiency , 2010 .

[119]  Brent S. Pedersen,et al.  A map of constrained coding regions in the human genome , 2017, Nature Genetics.

[120]  Brian Brooks,et al.  A novel de novo CAPN5 mutation in a patient with inflammatory vitreoretinopathy, hearing loss, and developmental delay , 2018, Cold Spring Harbor molecular case studies.

[121]  Julie R. Jones,et al.  De novo pathogenic variants in CHAMP1 are associated with global developmental delay, intellectual disability, and dysmorphic facial features , 2016, Cold Spring Harbor molecular case studies.

[122]  Serge Batalov,et al.  Concomitant diagnosis of immune deficiency and Pseudomonas sepsis in a 19 month old with ecthyma gangrenosum by host whole-genome sequencing , 2018, Cold Spring Harbor molecular case studies.

[123]  Rajkumar Venkatramani,et al.  Multimodal molecular analysis of an atypical small cell carcinoma of the ovary, hypercalcemic type , 2018, Cold Spring Harbor molecular case studies.

[124]  Sander M Houten,et al.  Biallelic Mutations in MRPS34 Lead to Instability of the Small Mitoribosomal Subunit and Leigh Syndrome. , 2017, American journal of human genetics.

[125]  Koji Abe,et al.  Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy , 2018, Nature Genetics.