Examining troughs in the mass distribution of all theoretically possible tryptic peptides.

This work describes the mass distribution of all theoretically possibly tryptic peptides made of 20 amino acids, up to the mass of 3 kDa, with resolution of 0.001 Da. We characterize regions between the peaks of the distribution, including gaps (forbidden zones) and low-populated areas (quiet zones). We show how the gaps shrink over the mass range and when they completely disappear. We demonstrate that peptide compositions in quiet zones are less diverse than those in the peaks of the distribution and that by eliminating certain types of unrealistic compositions the gaps in the distribution may be increased. The mass distribution is generated using a parallel implementation of a recursive procedure that enumerates all amino acid compositions. It allows us to enumerate all compositions of tryptic peptides below 3 kDa in 48 min using a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores). The results of this work can be used to facilitate protein identification and mass defect labeling in mass spectrometry-based proteomics experiments.

[1]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[2]  Robert Petesch,et al.  "Mass defect" tags for biomolecular mass spectrometry. , 2003, Journal of mass spectrometry : JMS.

[3]  J. Epstein,et al.  De novo peptide sequencing using exhaustive enumeration of peptide composition , 2006, Journal of the American Society for Mass Spectrometry.

[4]  Eric D. Dodds,et al.  Enhanced peptide mass fingerprinting through high mass accuracy: Exclusion of non-peptide signals based on residual mass. , 2006, Journal of proteome research.

[5]  R. Phillips,et al.  Mass defect labeling of cysteine for improving peptide assignment in shotgun proteomic analyses. , 2006, Analytical chemistry.

[6]  M. Mann,et al.  More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS. , 2011, Journal of proteome research.

[7]  Improving mass defect filters for human proteins. , 2010, Journal of proteome research.

[8]  M. Emmett,et al.  Theoretical and experimental prospects for protein identification based solely on accurate mass measurement. , 2004, Journal of proteome research.

[9]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[10]  B. Spengler Accurate Mass as a Bioinformatic Parameter in Data-to-Knowledge Conversion: Fourier Transform Ion Cyclotron Resonance Mass Spectrometry for Peptide De Novo Sequencing , 2007, European journal of mass spectrometry.

[11]  B. Chait,et al.  Protein indentification using mass spectrometric information , 1998, Electrophoresis.

[12]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[13]  Brian E. Howard,et al.  Accessible proteomics space and its implications for peak capacity for zero-, one- and two-dimensional separations coupled with FT-ICR and TOF mass spectrometry. , 2006, Journal of mass spectrometry : JMS.

[14]  Bernhard Spengler,et al.  De novo sequencing, peptide composition analysis, and composition-based sequencing: A new strategy employing accurate mass determination by fourier transform ion cyclotron resonance mass spectrometry , 2004, Journal of the American Society for Mass Spectrometry.

[15]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[16]  Roman A. Zubarev,et al.  Accuracy Requirements for Peptide Characterization by Monoisotopic Molecular Mass Measurements , 1996 .

[17]  D. Hochstrasser,et al.  Modeling peptide mass fingerprinting data using the atomic composition of peptides , 1999, Electrophoresis.