Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level

Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles.

[1]  M. Tress,et al.  The Evolutionary Fate of Alternatively Spliced Homologous Exons after Gene Duplication , 2015, Genome biology and evolution.

[2]  Alfonso Valencia,et al.  Most highly expressed protein-coding genes have a single dominant isoform. , 2015, Journal of proteome research.

[3]  Alfonso Valencia,et al.  Alternative splicing and co-option of transposable elements: the case of TMPO/LAP2α and ZNF451 in mammals , 2015, Bioinform..

[4]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[5]  J. Parks,et al.  A novel TRPS1 gene mutation causing trichorhinophalangeal syndrome with growth hormone responsive short stature: a case report and review of the literature , 2014, International Journal of Pediatric Endocrinology.

[6]  Vadim N. Gladyshev,et al.  Translation inhibitors cause abnormalities in ribosome profiling experiments , 2014, Nucleic acids research.

[7]  M. Tress,et al.  Analyzing the First Drafts of the Human Proteome , 2014, Journal of proteome research.

[8]  Michael B. Black,et al.  IVT-seq reveals extreme bias in RNA sequencing , 2014, Genome Biology.

[9]  J. Harrow,et al.  Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes , 2014, Human molecular genetics.

[10]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[11]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[12]  M. Albà,et al.  Long non-coding RNAs as a source of new peptides , 2014, eLife.

[13]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[14]  Pedro Carvalho,et al.  ER-associated degradation: Protein quality control and beyond , 2014, The Journal of cell biology.

[15]  Ioannis Xenarios,et al.  Analysis of Stop-Gain and Frameshift Variants in Human Innate Immunity Genes , 2014, bioRxiv.

[16]  Michael J. Emanuele,et al.  A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells , 2014, eLife.

[17]  E. Bennett,et al.  Protecting the proteome: Eukaryotic cotranslational quality control pathways , 2014, The Journal of cell biology.

[18]  Sivakumar Gowrisankar,et al.  The landscape of genetic variation in dilated cardiomyopathy as surveyed by clinical DNA sequencing , 2014, Genetics in Medicine.

[19]  Amos Bairoch,et al.  Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. , 2014, Journal of proteome research.

[20]  S. V. Heesch,et al.  University of Groningen Quantitative and Qualitative Proteome Characteristics Extracted from In-Depth Integrated Genomics and Proteomics Analysis , 2018 .

[21]  L. Maquat,et al.  Organizing principles of mammalian nonsense-mediated mRNA decay. , 2013, Annual review of genetics.

[22]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[23]  François-Michel Boisvert,et al.  Direct Detection of Alternative Open Reading Frames Translation Products in Human Significantly Expands the Proteome , 2013, PloS one.

[24]  Zachery R Gregorich,et al.  In-depth proteomic analysis of human tropomyosin by top-down mass spectrometry , 2013, Journal of Muscle Research and Cell Motility.

[25]  Nicholas T. Ingolia,et al.  Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins , 2013, Cell.

[26]  Eric W. Deutsch,et al.  Combining Results of Multiple Search Engines in Proteomics* , 2013, Molecular & Cellular Proteomics.

[27]  Eric W Deutsch,et al.  The state of the human proteome in 2012 as viewed through PeptideAtlas. , 2013, Journal of proteome research.

[28]  Alfonso Valencia,et al.  APPRIS: annotation of principal and alternative splice isoforms , 2012, Nucleic Acids Res..

[29]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[30]  Alfonso Valencia,et al.  Comparative Proteomics Reveals a Significant Bias Toward Alternative Protein Isoforms with Conserved Structure and Function , 2012, Molecular biology and evolution.

[31]  Jane Loveland,et al.  Tracking and coordinating an international curation effort for the CCDS Project , 2012, Database J. Biol. Databases Curation.

[32]  M. Mann,et al.  Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of Most Proteins* , 2012, Molecular & Cellular Proteomics.

[33]  M. Goedert,et al.  Phosphorylation of microtubule‐associated protein tau by AMPK‐related kinases , 2012, Journal of neurochemistry.

[34]  A. Heck,et al.  The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells , 2011, Molecular systems biology.

[35]  Martin Kircher,et al.  Deep proteome and transcriptome mapping of a human cancer cell line , 2011, Molecular systems biology.

[36]  Alfonso Valencia,et al.  firestar—advances in the prediction of functionally important residues , 2011, Nucleic Acids Res..

[37]  Lennart Martens,et al.  Combining quantitative proteomics data processing workflows for greater sensitivity , 2011, Nature Methods.

[38]  Jonathan M. Mudge,et al.  The Origins, Evolution, and Functional Potential of Alternative Splicing in Vertebrates , 2011, Molecular biology and evolution.

[39]  James C. Wright,et al.  Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome. , 2011, Genome research.

[40]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[41]  Peter Tompa,et al.  Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder , 2010, Nucleic acids research.

[42]  J. Towbin,et al.  Nebulette mutations are associated with dilated cardiomyopathy and endocardial fibroelastosis. , 2010, Journal of the American College of Cardiology.

[43]  Yusu Gu,et al.  Loss of Enigma Homolog Protein Results in Dilated Cardiomyopathy , 2010, Circulation research.

[44]  C Joel McManus,et al.  Global analysis of trans-splicing in Drosophila , 2010, Proceedings of the National Academy of Sciences.

[45]  Hongqiang Cheng,et al.  ALP/Enigma PDZ-LIM domain proteins in the heart. , 2010, Journal of molecular cell biology.

[46]  Steffen Heber,et al.  Detection of alternative splice variants at the proteome level in Aspergillus flavus. , 2010, Journal of proteome research.

[47]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[48]  T. Nilsen,et al.  Expansion of the eukaryotic proteome by alternative splicing , 2010, Nature.

[49]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[50]  S. Arold,et al.  Alternative Splicing Modulates Autoinhibition and SH3 Accessibility in the Src Kinase Fyn , 2009, Molecular and Cellular Biology.

[51]  John Moult,et al.  Stochastic noise in splicing machinery , 2009 .

[52]  John Moult,et al.  Structural implication of splicing stochastics , 2009 .

[53]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[54]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[55]  Damian Fermin,et al.  Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. , 2009, Cancer research.

[56]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[57]  Samuel H. Payne,et al.  Discovery and revision of Arabidopsis genes by proteogenomics , 2008, Proceedings of the National Academy of Sciences.

[58]  M. Tress,et al.  Proteomics studies confirm the presence of alternative protein isoforms on a large scale , 2008, Genome Biology.

[59]  Andrea Norris,et al.  The nebulette repeat domain is necessary for proper maintenance of tropomyosin with the cardiac sarcomere. , 2008, Experimental cell research.

[60]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[61]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[62]  Hanno Steen,et al.  Robust prediction of the MASCOT score for an improved quality assessment in mass spectrometric proteomics. , 2008, Journal of proteome research.

[63]  K. Djinović-Carugo,et al.  α-Actinin structure and regulation , 2008, Cellular and Molecular Life Sciences.

[64]  Hagen Blankenburg,et al.  The implications of alternative splicing in the ENCODE protein complement , 2007, Proceedings of the National Academy of Sciences.

[65]  R. Guigó,et al.  Improving gene annotation using peptide mass spectrometry. , 2007, Genome research.

[66]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[67]  Rolf Backofen,et al.  Alternative Splicing at NAGNAG Acceptors: Simply Noise or Noise and More? , 2006, PLoS genetics.

[68]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[69]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[70]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[71]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[72]  S. Snyder,et al.  PIKE (Phosphatidylinositol 3-Kinase Enhancer)-A GTPase Stimulates Akt Activity and Mediates Cellular Invasion* , 2004, Journal of Biological Chemistry.

[73]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[74]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[75]  Yuichiro Maéda,et al.  Crystal structure of CapZ: structural basis for actin filament barbed end capping , 2003, The EMBO journal.

[76]  Eugene V Koonin,et al.  Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. , 2003, Trends in genetics : TIG.

[77]  C. Moncman,et al.  Targeted disruption of nebulette protein expression alters cardiac myofibril assembly and function. , 2002, Experimental cell research.

[78]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[79]  J. Valcárcel,et al.  Alternative pre-mRNA splicing: the logic of combinatorial control. , 2000, Trends in biochemical sciences.

[80]  Jonathan W. Yewdell,et al.  Rapid degradation of a large fraction of newly synthesized proteins by proteasomes , 2000, Nature.

[81]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[82]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[83]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[84]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[85]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[86]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .