Fast and accurate identification of semi-tryptic peptides in shotgun proteomics

MOTIVATION One of the major problems in shotgun proteomics is the low peptide coverage when analyzing complex protein samples. Identifying more peptides, e.g. non-tryptic peptides, may increase the peptide coverage and improve protein identification and/or quantification that are based on the peptide identification results. Searching for all potential non-tryptic peptides is, however, time consuming for shotgun proteomics data from complex samples, and poses a challenge for a routine data analysis. RESULTS We hypothesize that non-tryptic peptides are mainly created from the truncation of regular tryptic peptides before separation. We introduce the notion of truncatability of a tryptic peptide, i.e. the probability of the peptide to be identified in its truncated form, and build a predictor to estimate a peptide's truncatability from its sequence. We show that our predictions achieve useful accuracy, with the area under the ROC curve from 76% to 87%, and can be used to filter the sequence database for identifying truncated peptides. After filtering, only a limited number of tryptic peptides with the highest truncatability are retained for non-tryptic peptide searching. By applying this method to identification of semi-tryptic peptides, we show that a significant number of such peptides can be identified within a searching time comparable to that of tryptic peptide identification.

[1]  J. Yates Mass spectral analysis in proteomics. , 2004, Annual review of biophysics and biomolecular structure.

[2]  K. Resing,et al.  Proteomics strategies for protein identification , 2005, FEBS letters.

[3]  Pavel A. Pevzner,et al.  Peptide sequence tags for fast database search in mass-spectrometry. , 2005 .

[4]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[5]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[6]  James P. Reilly,et al.  A computational approach toward label-free protein quantification using predicted peptide detectability , 2006, ISMB.

[7]  P. Radivojac,et al.  Protein flexibility and intrinsic disorder , 2004, Protein science : a publication of the Protein Society.

[8]  S. Vucetic,et al.  Flavors of protein disorder , 2003, Proteins.

[9]  David L Tabb,et al.  Efficient and specific trypsin digestion of microgram to nanogram quantities of proteins in organic-aqueous solvent systems. , 2006, Analytical chemistry.

[10]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[11]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[13]  L. Hunter,et al.  Proteomic informatics. , 2004, International review of neurobiology.

[14]  M. Mann,et al.  Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues*S , 2004, Molecular & Cellular Proteomics.

[15]  S. Hanash,et al.  Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study , 2006, Nature Biotechnology.

[16]  Pavel A. Pevzner,et al.  Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry , 2005, RECOMB.

[17]  Dekel Tsur,et al.  Identification of post-translational modifications by blind search of mass spectra , 2005, Nature Biotechnology.

[18]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[19]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[20]  Zoran Obradovic,et al.  Predicting intrinsic disorder from amino acid sequence , 2003, Proteins.

[21]  E. Marcotte,et al.  Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation , 2007, Nature Biotechnology.

[22]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[23]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[24]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[25]  K Worowski [Mechanism of action of proteolytic enzymes]. , 1978, Postepy higieny i medycyny doswiadczalnej.

[26]  Vishwajeeth R Pagala,et al.  Proteomic studies of the intrinsically unstructured mammalian proteome. , 2006, Journal of proteome research.