A novel splicing outcome reveals more than 2000 new mammalian protein isoforms

MOTIVATION We have recently characterized an instance of alternative splicing that differs from the canonical gene transcript by deletion of a length of sequence not divisible by three, but where translation can be rescued by an alternative start codon. This results in a predicted protein in which the amino terminus differs markedly in sequence from the known protein product(s), as it is translated from an alternative reading frame. Automated pipelines have annotated thousands of splice variants but have overlooked these protein isoforms, leading to them being underrepresented in current databases. RESULTS Here we describe 1849 human and 733 mouse transcripts that can be transcribed from an alternate ATG. Of these, >80% have not been annotated previously. Those conserved between human and mouse genomes (and hence under likely evolutionary selection) are identified. We provide mass spectroscopy evidence for translation of selected transcripts. Of the described splice variants, only one has previously been studied in detail and converted the encoded protein from an activator of cell-function to a suppressor, demonstrating that these splice variants can result in profound functional change. We investigate the potential functional effects of this splicing using a variety of bioinformatic tools. The 2582 variants we describe are involved in a wide variety of biological processes, and therefore open many new avenues of research.

[1]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[2]  B. Shen,et al.  Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution , 2012, Proceedings of the National Academy of Sciences.

[3]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[4]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[5]  C. Goodnow,et al.  Defective T‐cell function leading to reduced antibody production in a kleisin‐β mutant mouse , 2008, Immunology.

[6]  M. Kozak Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes , 1986, Cell.

[7]  J. Vacher,et al.  Characterization of the murine Inpp4b gene and identification of a novel isoform. , 2006, Gene.

[8]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[9]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[10]  Eduardo Eyras,et al.  ESTGenes: alternative splicing from ESTs in Ensembl. , 2004, Genome research.

[11]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[12]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[13]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[14]  Anton Nekrutenko,et al.  A First Look at ARFome: Dual-Coding Genes in Mammalian Genomes , 2007, PLoS Comput. Biol..

[15]  M. Kozak,et al.  Recognition of AUG and alternative initiator codons is augmented by G in position +4 but is not generally affected by the nucleotides in positions +5 and +6 , 1997, The EMBO journal.

[16]  Manolis Kellis,et al.  PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions , 2011, Bioinform..

[17]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[18]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[19]  Xiangyin Kong,et al.  Length of the ORF, position of the first AUG and the Kozak motif are important factors in potential dual-coding transcripts , 2010, Cell Research.

[20]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[21]  Georgii A. Bazykin,et al.  Alternative translation start sites are conserved in eukaryotic genomes , 2010, Nucleic Acids Res..

[22]  Ching-Wen Chang,et al.  The conserved metalloprotease invadolysin localizes to the surface of lipid droplets , 2009, Journal of Cell Science.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  A. Kochetov,et al.  Alternative translation start sites and hidden coding potential of eukaryotic mRNAs. , 2008, BioEssays : news and reviews in molecular, cellular and developmental biology.

[25]  A. Theodoratos,et al.  Splice variants of the condensin II gene Ncaph2 include alternative reading frame translations of exon 1 , 2012, The FEBS journal.

[26]  T. Nilsen,et al.  Expansion of the eukaryotic proteome by alternative splicing , 2010, Nature.

[27]  C. Burge,et al.  Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues , 2012, Science.

[28]  Paola Bonizzoni,et al.  Detecting Alternative Gene Structures from Spliced ESTs: A Computational Approach , 2009, J. Comput. Biol..

[29]  L. Makaroff,et al.  A mutation in a chromosome condensin II subunit, kleisin β, specifically disrupts T cell development , 2007, Proceedings of the National Academy of Sciences.

[30]  R B Denman,et al.  Using RNAFOLD to predict the activity of small catalytic RNAs. , 1993, BioTechniques.

[31]  Denman Rb,et al.  Using RNAFOLD to predict the activity of small catalytic RNAs. , 1993 .

[32]  David States,et al.  Selecting for functional alternative splices in ESTs. , 2002, Genome research.

[33]  E. Buratti,et al.  Influence of RNA Secondary Structure on the Pre-mRNA Splicing Process , 2004, Molecular and Cellular Biology.

[34]  Michael D. Wilson,et al.  The Evolutionary Landscape of Alternative Splicing in Vertebrate Species , 2012, Science.

[35]  Rolf Backofen,et al.  Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity , 2004, Nature Genetics.

[36]  O. A. Volkova,et al.  Interrelations between the Nucleotide Context of Human Start AUG Codon, N-end Amino Acids of the Encoded Protein and Initiation of Translation , 2010, Journal of biomolecular structure & dynamics.

[37]  Patricia A. Lovelace,et al.  Two isoforms of otubain 1 regulate T cell anergy via GRAIL , 2004, Nature Immunology.