Detecting outlier peptides in quantitative high-throughput mass spectrometry data.

Quantitative high-throughput mass spectrometry has become an established tool to measure relative gene expression proteome-wide. The output of such an experiment usually consists of a list of expression ratios (fold changes) for several thousand proteins between two conditions. However, we observed that individual peptide fold changes may show a significantly different behavior than other peptides from the same protein and that these differences cannot be explained by imprecise measurements. Such outlier peptides can be the consequence of several technical (misidentifications, misquantifications) or biological (post-translational modifications, differential regulation of isoforms) reasons. We developed a method to detect outlier peptides in mass spectrometry data which is able to delineate imprecise measurements from real outlier peptides with high accuracy when the true difference is as small as 1.4 fold. We applied our method to experimental data and investigated the different technical and biological effects that result in outlier peptides. Our method will assist future research to reduce technical bias and can help to identify genes with differentially regulated protein isoforms in high throughput mass spectrometry data.

[1]  Keiryn L. Bennett,et al.  Introduction to Computational Proteomics , 2007, PLoS Comput. Biol..

[2]  Edward L. Huttlin,et al.  A Tissue-Specific Atlas of Mouse Protein Phosphorylation and Expression , 2010, Cell.

[3]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[4]  T. Cooper,et al.  Alternative Splicing Regulation Impacts Heart Development , 2005, Cell.

[5]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[6]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[7]  M. Mann,et al.  Mass spectrometry–based proteomics turns quantitative , 2005, Nature chemical biology.

[8]  R. Branca,et al.  Enhanced Information Output From Shotgun Proteomics Data by Protein Quantification and Peptide Quality Control (PQPQ)* , 2011, Molecular & Cellular Proteomics.

[9]  D. Bartel,et al.  The impact of microRNAs on protein output , 2008, Nature.

[10]  Lili Wan,et al.  RNA and Disease , 2009, Cell.

[11]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[12]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[13]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[14]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[15]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[16]  M. Carmo-Fonseca,et al.  The emerging role of splicing factors in cancer , 2008, EMBO reports.

[17]  M. Mann,et al.  More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS. , 2011, Journal of proteome research.

[18]  S. Brunak,et al.  Quantitative Phosphoproteomics Reveals Widespread Full Phosphorylation Site Occupancy During Mitosis , 2010, Science Signaling.

[19]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[20]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[21]  P. Pevzner,et al.  False discovery rates of protein identifications: a strike against the two-peptide rule. , 2009, Journal of proteome research.

[22]  N. Rajewsky,et al.  Widespread changes in protein synthesis induced by microRNAs , 2008, Nature.

[23]  Kristen W. Lynch,et al.  Consequences of regulated pre-mRNA splicing in the immune system , 2004, Nature Reviews Immunology.

[24]  M. Ares,et al.  Sam68 Regulates a Set of Alternatively Spliced Exons during Neurogenesis , 2008, Molecular and Cellular Biology.

[25]  M. Mann,et al.  Is Proteomics the New Genomics? , 2007, Cell.

[26]  Ruedi Aebersold,et al.  PhosphoPep—a database of protein phosphorylation sites in model organisms , 2008, Nature Biotechnology.

[27]  K. Krishnamoorthy,et al.  A parametric bootstrap approach for ANOVA with unequal variances: Fixed and random models , 2007, Comput. Stat. Data Anal..

[28]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[29]  Knut Reinert,et al.  OpenMS and TOPP: Open Source Software for LC-MS Data Analysis , 2010, Proteome Bioinformatics.

[30]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.