Incomplete annotation of OMIM genes is likely to be limiting the diagnostic yield of genetic testing, particularly for neurogenetic disorders

Although the increasing use of whole-exome and whole-genome sequencing have improved the yield of genetic testing for Mendelian disorders, an estimated 50% of patients still leave the clinic without a genetic diagnosis. This can be attributed in part to our lack of ability to accurately interpret the genetic variation detected through next-generation sequencing. Variant interpretation is fundamentally reliant on accurate and complete gene annotation, however numerous reports and discrepancies between gene annotation databases reveals that the knowledge of gene annotation remains far from comprehensive. Here, we detect and validate transcription in an annotation-agnostic manner across all 41 different GTEx tissues, then connect novel transcription to known genes, ultimately improving the annotation of 63% of the known OMIM-morbid genes. We find the majority of novel transcription to be tissue-specific in origin, with brain tissues being most susceptible to misannotation. Furthermore, we find that novel transcribed regions tend to be poorly conserved, but are significantly depleted for genetic variation within humans, suggesting they are functionally significant and potentially have human-specific functions. We present our findings through an online platform vizER, which enables individual genes to be visualised and queried for evidence of misannotation. We also release all tissue-specific transcriptomes in a BED format for ease of integration with whole-genome sequencing data. We anticipate that these resources will improve the diagnostic yield for a wide range of Mendelian disorders.

[1]  Hunna J. Watson,et al.  Genetic Identification of Cell Types Underlying Brain Complex Traits Yields Novel Insights Into the Etiology of Parkinson’s Disease , 2019, bioRxiv.

[2]  W. Doolittle,et al.  We simply cannot go on being so vague about ‘function’ , 2018, Genome Biology.

[3]  S. Salzberg,et al.  Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise , 2018, bioRxiv.

[4]  Lars E. Borm,et al.  Molecular Architecture of the Mouse Nervous System , 2018, Cell.

[5]  Shanrong Zhao,et al.  Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion , 2018, Scientific Reports.

[6]  Bing Ren,et al.  The human noncoding genome defined by genetic diversity , 2018, Nature Genetics.

[7]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[8]  David A. Knowles,et al.  Inferring relevant cell types for complex traits using single-cell gene expression , 2017, bioRxiv.

[9]  Matthew R. Nelson,et al.  STOPGAP: a database for systematic target opportunity assessment by genetic association predictions , 2017, Bioinform..

[10]  J. Harrow,et al.  Genome annotation for clinical genomic diagnostics: strengths and weaknesses , 2017, Genome Medicine.

[11]  Jeffrey T Leek,et al.  Reproducible RNA-seq analysis using recount2 , 2017, Nature Biotechnology.

[12]  Evan Z. Macosko,et al.  Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types , 2017, Nature Genetics.

[13]  Rafael A. Irizarry,et al.  Flexible expressed region analysis for RNA-seq with derfinder , 2015, bioRxiv.

[14]  B. Frey,et al.  Does conservation account for splicing patterns? , 2016, BMC Genomics.

[15]  Christopher R. Sibley,et al.  Lessons from non-canonical splicing , 2016, Nature Reviews Genetics.

[16]  Joshua L. Deignan,et al.  Clinical exome sequencing in neurogenetic and neuropsychiatric disorders , 2016, Annals of the New York Academy of Sciences.

[17]  Seth G. N. Grant,et al.  Identification of Vulnerable Cell Types in Major Brain Disorders Using Single Cell Transcriptomes and Expression Weighted Cell Type Enrichment , 2016, Front. Neurosci..

[18]  David Sankoff,et al.  Locating rearrangement events in a phylogeny based on highly fragmented assemblies , 2016, BMC Genomics.

[19]  Z. Weng,et al.  RNA Sequence Analysis of Human Huntington Disease Brain Reveals an Extensive Increase in Inflammatory and Developmental Gene Expression , 2015, PloS one.

[20]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[21]  Carlo Reggiani,et al.  Developmental myosins: expression patterns and functional significance , 2015, Skeletal Muscle.

[22]  L. Levin,et al.  Biodiversity on the Rocks: Macrofauna Inhabiting Authigenic Carbonate at Costa Rica Methane Seeps , 2015, PloS one.

[23]  Oligodendroglia and Myelin in Neurodegenerative Diseases: More Than Just Bystanders? , 2015, Molecular Neurobiology.

[24]  Davis J. McCarthy,et al.  Factors influencing success of clinical genome sequencing across a broad spectrum of disorders , 2015, Nature Genetics.

[25]  Michael Briese,et al.  Recursive splicing in long vertebrate genes , 2015, Nature.

[26]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[27]  J. Shendure,et al.  Autosomal dominant multiple pterygium syndrome is caused by mutations in MYH3 , 2015, bioRxiv.

[28]  H. Rehm,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[29]  Leonardo Collado-Torres,et al.  Developmental regulation of human cortex transcription and its clinical relevance at base resolution , 2014, Nature Neuroscience.

[30]  Lei Shang,et al.  Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants , 2014, Proceedings of the National Academy of Sciences.

[31]  Robert J. Weatheritt,et al.  A Highly Conserved Program of Neuronal Microexons Is Misregulated in Autistic Brains , 2014, Cell.

[32]  Magalie S Leduc,et al.  Molecular findings among patients referred for clinical whole-exome sequencing. , 2014, JAMA.

[33]  L. Vissers,et al.  Genome sequencing identifies major causes of severe intellectual disability , 2014, Nature.

[34]  T. Ideker,et al.  Exome Sequencing Links Corticospinal Motor Neuron Disease to Common Neurodegenerative Disorders , 2014, Science.

[35]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[36]  K. Boycott,et al.  Rare-disease genetics in the era of next-generation sequencing: discovery to translation , 2013, Nature Reviews Genetics.

[37]  Tieliu Shi,et al.  Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses. , 2013, RNA.

[38]  M. Long,et al.  New genes expressed in human brains: Implications for annotating evolving genomes , 2012, BioEssays : news and reviews in molecular, cellular and developmental biology.

[39]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[40]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[41]  F. Berardi,et al.  Synthesis and preclinical evaluation of novel PET probes for P-glycoprotein function and expression. , 2009, Journal of medicinal chemistry.

[42]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[43]  M. Bamshad,et al.  Mutations in embryonic myosin heavy chain (MYH3) cause Freeman-Sheldon syndrome and Sheldon-Hall syndrome , 2006, Nature Genetics.

[44]  David Haussler,et al.  Phylogenetic Hidden Markov Models , 2005 .

[45]  Gene W. Yeo,et al.  Variation in alternative splicing across human tissues , 2004, Genome Biology.

[46]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.