High‐throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE

Motivation: The accurate interpretation of genetic variants is critical for characterizing genotype‐phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results : We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (‘Assessing Changes to Exons’) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene‐structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA‐seq reads, and can be used to improve read alignment and transcript quantification when an individual‐specific genome sequence is available. Using publicly available RNA‐seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense‐mediated decay, and we show that predicted loss‐of‐function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population‐based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation: ACE is written in open‐source C ++ and Perl and is available from geneprediction.org/ACE Contact: myandell@genetics.utah.edu or tim.reddy@duke.edu Supplementary information: Supplementary information is available at Bioinformatics online.

[1]  L. Maquat,et al.  A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. , 1998, Trends in biochemical sciences.

[2]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[3]  Steven Salzberg,et al.  Efficient decoding algorithms for generalized hidden Markov model gene finders , 2005, BMC Bioinformatics.

[4]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[5]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[6]  E. Barta,et al.  Plant nonsense-mediated mRNA decay is controlled by different autoregulatory circuits and can be induced by an EJC-like complex , 2013, Nucleic acids research.

[7]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[8]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[9]  S. Yip Sequence variation at the human ABO locus , 2002, Annals of human genetics.

[10]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[11]  I. Vořechovský,et al.  Aberrant 3′ splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization , 2006, Nucleic acids research.

[12]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[13]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[14]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[15]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[16]  Y. Xing,et al.  Aberrant splicing of intron 1 leads to the heterogeneous 5' UTR and decreased expression of waxy gene in rice cultivars of intermediate amylose content. , 1998, The Plant journal : for cell and molecular biology.

[17]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[18]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[19]  L. Romão,et al.  Gene Expression Regulation by Upstream Open Reading Frames and Human Disease , 2013, PLoS genetics.

[20]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[21]  Gonçalo R. Abecasis,et al.  Unified representation of genetic variants , 2015, Bioinform..

[22]  H. Lähdesmäki,et al.  Cancer-associated ASXL1 mutations may act as gain-of-function mutations of the ASXL1–BAP1 complex , 2015, Nature Communications.

[23]  N. Saitou,et al.  An integrative evolution theory of histo-blood group ABO and related genes , 2014, Scientific Reports.

[24]  S. Liebhaber,et al.  Proximity of the poly(A)-binding protein to a premature termination codon inhibits mammalian nonsense-mediated mRNA decay. , 2008, RNA.

[25]  Gregory R. Grant,et al.  Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data , 2015, Bioinform..

[26]  Adrian R. Krainer,et al.  Aberrant 5′ splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization , 2007, Nucleic acids research.

[27]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[28]  E. Dermitzakis,et al.  Rare and Common Regulatory Variation in Population-Scale Sequenced Human Genomes , 2011, PLoS genetics.

[29]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[30]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[31]  M. Hentze,et al.  Mechanism of escape from nonsense-mediated mRNA decay of human beta-globin transcripts with nonsense mutations in the first exon. , 2011, RNA.

[32]  E. Fogarty,et al.  Widespread alternative and aberrant splicing revealed by lariat sequencing , 2015, Nucleic acids research.

[33]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[34]  Olivier Delaneau,et al.  Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel , 2014, Nature Communications.

[35]  Yeon Jeong Kim,et al.  Intron retention is a widespread mechanism of tumor-suppressor inactivation , 2015, Nature Genetics.

[36]  S. Wessler,et al.  A naturally occurring functional allele of the rice waxy locus has a GT to TT mutation at the 5' splice site of the first intron. , 1998, The Plant journal : for cell and molecular biology.

[37]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[38]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[39]  Mark Yandell,et al.  VAAST 2.0: Improved Variant Classification and Disease-Gene Identification Using a Conservation-Controlled Amino Acid Substitution Matrix , 2013, Genetic epidemiology.

[40]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[41]  Henrik Clausen,et al.  Molecular genetic basis of the histo-blood group ABO system , 1990, Nature.

[42]  Roderic Guigó,et al.  Identification of genetic variants associated with alternative splicing using sQTLseekeR , 2014, Nature Communications.

[43]  Steven Salzberg,et al.  JIGSAW: integration of multiple sources of evidence for gene prediction , 2005, Bioinform..

[44]  B. Frey,et al.  transcriptomes Widespread intron retention in mammals functionally tunes Material , 2014 .

[45]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[46]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[47]  William H. Majoros,et al.  Methods for computational gene prediction , 2007 .

[48]  Georg Seelig,et al.  Learning the Sequence Determinants of Alternative Splicing from Millions of Random Sequences , 2015, Cell.

[49]  S. Liebhaber,et al.  Interaction of PABPC1 with the translation initiation complex is critical to the NMD resistance of AUG-proximal nonsense mutations , 2011, Nucleic acids research.

[50]  Joseph K. Pickrell,et al.  Noisy Splicing Drives mRNA Isoform Diversity in Human Cells , 2010, PLoS genetics.

[51]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[52]  Nuno A. Fonseca,et al.  Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction , 2015, BMC Genomics.

[53]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[54]  Geet Duggal,et al.  Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference , 2015, bioRxiv.

[55]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[56]  Ayal B. Gussow,et al.  The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity , 2015, PLoS genetics.

[57]  D. Goldstein,et al.  Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes , 2013, PLoS genetics.

[58]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[59]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[60]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[61]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[62]  Qian Qian,et al.  Allelic diversities in rice starch biosynthesis lead to a diverse array of rice eating and cooking qualities , 2009, Proceedings of the National Academy of Sciences.