Improving peptide identification sensitivity in shotgun proteomics by stratification of search space.

Because of its high specificity, trypsin is the enzyme of choice in shotgun proteomics. Nonetheless, several publications do report the identification of semitryptic and nontryptic peptides. Many of these peptides are thought to be signaling peptides or to have formed during sample preparation. It is known that only a small fraction of tandem mass spectra from a trypsin-digested protein mixture can be confidently matched to tryptic peptides. If other possibilities such as post-translational modifications and single-amino acid polymorphisms are ignored, this suggests that many unidentified spectra originate from semitryptic and nontryptic peptides. To include them in database searches, however, may not improve overall peptide identification because of the possible sensitivity reduction from search space expansion. To circumvent this issue for E-value-based search methods, we have designed a scheme that categorizes qualified peptides (i.e., peptides whose differences in molecular weight from the parent ion are within a specified error tolerance) into three tiers: tryptic, semitryptic, and nontryptic. This classification allows peptides that belong to different tiers to have different Bonferroni correction factors. Our results show that this scheme can significantly improve retrieval performance compared to those of search strategies that assign equal Bonferroni correction factors to all qualified peptides.

[1]  Edward M Marcotte,et al.  How do shotgun proteomics algorithms identify proteins? , 2007, Nature Biotechnology.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Jeffrey R. Whiteaker,et al.  Head-to-head comparison of serum fractionation techniques. , 2007, Journal of proteome research.

[4]  A. Nesvizhskii,et al.  Computational analysis of unassigned high‐quality MS/MS spectra in proteomic data sets , 2010, Proteomics.

[5]  R. Zeng,et al.  BuildSummary: using a group-based approach to improve the sensitivity of peptide/protein identification in shotgun proteomics. , 2012, Journal of proteome research.

[6]  Ravi Tharakan,et al.  Data maximization by multipass analysis of protein mass spectra , 2010, Proteomics.

[7]  Richard D. Smith,et al.  Does trypsin cut before proline? , 2008, Journal of proteome research.

[8]  William Stafford Noble,et al.  Statistical calibration of the SEQUEST XCorr function. , 2009, Journal of proteome research.

[9]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[10]  David L Tabb,et al.  Efficient and specific trypsin digestion of microgram to nanogram quantities of proteins in organic-aqueous solvent systems. , 2006, Analytical chemistry.

[11]  Rong Zeng,et al.  Fast and accurate identification of semi-tryptic peptides in shotgun proteomics , 2008, Bioinform..

[12]  Dekel Tsur,et al.  Identification of post-translational modifications via blind search of mass-spectra , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[13]  A. Podtelejnikov,et al.  Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Steen,et al.  Why b, y's? Sodiation-induced tryptic peptide-like fragmentation of non-tryptic peptides , 2007 .

[15]  M. Mann,et al.  Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues*S , 2004, Molecular & Cellular Proteomics.

[16]  Richard D. Smith,et al.  Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. , 2007, Genome research.

[17]  Aleksey Y. Ogurtsov,et al.  RAId_aPS: MS/MS Analysis with Multiple Scoring Functions and Spectrum-Specific Statistics , 2008, PloS one.

[18]  R. Guigó,et al.  Improving gene annotation using peptide mass spectrometry. , 2007, Genome research.

[19]  B. Sorić Statistical “Discoveries” and Effect-Size Estimation , 1989 .

[20]  Nuno Bandeira,et al.  Protein identification by spectral networks analysis. , 2011, Methods in molecular biology.

[21]  Yong J. Kil,et al.  Comment on "Unbiased statistical analysis for multi-stage proteomic search strategies". , 2011, Journal of proteome research.

[22]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[23]  Yi-Kuo Yu,et al.  Assigning statistical significance to proteotypic peptides via database searches. , 2011, Journal of proteomics.

[24]  S. Elkabes,et al.  Altered proteolytic events in experimental autoimmune encephalomyelitis discovered by iTRAQ shotgun proteomics analysis of spinal cord , 2009, Proteome Science.

[25]  Yi-Kuo Yu,et al.  RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics , 2007, Biology Direct.

[26]  F. Levander,et al.  Membrane protein identification: N-terminal labeling of nontryptic membrane protein peptides facilitates database searching. , 2008, Journal of proteome research.

[27]  Douglas J. Baxter,et al.  Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. , 2011, Journal of proteome research.

[28]  L. David,et al.  Techniques for accurate protein identification in shotgun proteomic studies of human, mouse, bovine, and chicken lenses , 2009, Journal of ocular biology, diseases, and informatics.

[29]  Bobbie-Jo M. Webb-Robertson,et al.  Current trends in computational inference from mass spectrometry-based proteomics , 2007, Briefings Bioinform..

[30]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.