The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment

BackgroundMost mass spectrometry (MS) based proteomic studies depend on searching acquired tandem mass (MS/MS) spectra against databases of known protein sequences. In these experiments, however, a large number of high quality spectra remain unassigned. These spectra may correspond to novel peptides not present in the database, especially those corresponding to novel alternative splice (AS) forms. Recently, fast and comprehensive profiling of mammalian genomes using deep sequencing (i.e. RNA-Seq) has become possible. MS-based proteomics can potentially be used as an aid for protein-level validation of novel AS events observed in RNA-Seq data.ResultsIn this work, we have used publicly available mouse tissue proteomic and RNA-Seq datasets and have examined the feasibility of using MS data for the identification of novel AS forms by searching MS/MS spectra against translated mRNA sequences derived from RNA-Seq data. A significant correlation between the likelihood of identifying a peptide from MS/MS data and the number of reads in RNA-Seq data for the same gene was observed. Based on in silico experiments, it was also observed that only a fraction of novel AS forms identified from RNA-Seq had the corresponding junction peptide compatible with MS/MS sequencing. The number of novel peptides that were actually identified from MS/MS spectra was substantially lower than the number expected based on in silico analysis.ConclusionsThe ability to confirm novel AS forms from MS/MS data in the dataset analyzed was found to be quite limited. This can be explained in part by low abundance of many novel transcripts, with the abundance of their corresponding protein products falling below the limit of detection by MS.

[1]  Michael K. Coleman,et al.  Correlation of relative abundance ratios derived from peptide ion chromatograms and spectrum counting for quantitative proteomic analysis using stable isotope labeling. , 2005, Analytical chemistry.

[2]  Steffen Heber,et al.  Detection of alternative splice variants at the proteome level in Aspergillus flavus. , 2010, Journal of proteome research.

[3]  Charles Darwin,et al.  Experiments , 1800, The Medical and physical journal.

[4]  D. Tabb,et al.  Evaluation of strong cation exchange versus isoelectric focusing of peptides for multidimensional liquid chromatography-tandem mass spectrometry. , 2008, Journal of proteome research.

[5]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[6]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[7]  R. Aebersold,et al.  Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data , 2006, Molecular & Cellular Proteomics.

[8]  S. Carr,et al.  A Mitochondrial Protein Compendium Elucidates Complex I Disease Biology , 2008, Cell.

[9]  Damian Fermin,et al.  Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics , 2006, Genome Biology.

[10]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[11]  W. Gallagher,et al.  High-Throughput Proteomics Detection of Novel Splice Isoforms in Human Platelets , 2009, PloS one.

[12]  Jun Wang,et al.  A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data , 2008, BMC Bioinformatics.

[13]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[14]  John Moult,et al.  Stochastic noise in splicing machinery , 2009 .

[15]  A. Nesvizhskii,et al.  Computational analysis of unassigned high‐quality MS/MS spectra in proteomic data sets , 2010, Proteomics.

[16]  Jennifer Daub,et al.  Expressed sequence tags: medium-throughput protocols. , 2004, Methods in molecular biology.

[17]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[18]  R. Zimmer,et al.  Alternative splicing and protein structure evolution. , 2008, Nucleic acids research.

[19]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[20]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[21]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[22]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[23]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[24]  N. Edwards,et al.  Novel peptide identification from tandem mass spectra using ESTs and sequence database compression , 2007, Molecular systems biology.

[25]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[26]  Kang Ning,et al.  Nesvizhskii. Computational Analysis of Unassigned High Quality MS/MS Spectra in Large-scale Proteomic Datasets , 2010 .

[27]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[28]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[29]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[30]  Mark Gerstein,et al.  Global Survey of Human T Leukemic Cells by Integrating Proteomics and Transcriptomics Profiling*S , 2007, Molecular & Cellular Proteomics.

[31]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[32]  K. Resing,et al.  Comparison of Label-free Methods for Quantifying Human Proteins by Shotgun Proteomics*S , 2005, Molecular & Cellular Proteomics.