pepgrep: A Tool for Peptide MS/MS Pattern Matching

Typically, detection of protein sequences in collision-induced dissociation (CID) tandem MS (MS2) dataset is performed by mapping identified peptide ions back to protein sequence by using the protein database search (PDS) engine. Finding a particular peptide sequence of interest in CID MS2 records very often requires manual evaluation of the spectrum, regardless of whether the peptide-associated MS2 scan is identified by PDS algorithm or not. We have developed a compact cross-platform database-free command-line utility, pepgrep, which helps to find an MS2 fingerprint for a selected peptide sequence by pattern-matching of modelled MS2 data using Peptide-to-MS2 scoring algorithm. pepgrep can incorporate dozens of mass offsets corresponding to a variety of post-translational modifications (PTMs) into the algorithm. Decoy peptide sequences are used with the tested peptide sequence to reduce false-positive results. The engine is capable of screening an MS2 data file at a high rate when using a cluster computing environment. The matched MS2 spectrum can be displayed by using built-in graphical application programming interface (API) or optionally recorded to file. Using this algorithm, we were able to find extra peptide sequences in studied CID spectra that were missed by PDS identification. Also we found pepgrep especially useful for examining a CID of small fractions of peptides resulting from, for example, affinity purification techniques. The peptide sequences in such samples are less likely to be positively identified by using routine protein-centric algorithm implemented in PDS. The software is freely available at http://bsproteomics.essex.ac.uk:8080/data/download/pepgrep-1.4.tgz.

[1]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[2]  J. Yates,et al.  Direct analysis of protein complexes using mass spectrometry , 1999, Nature Biotechnology.

[3]  J. Yates,et al.  Tandem quadrupole Fourier-transform mass spectrometry of oligopeptides and small proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J R Yates,et al.  Protein sequencing by tandem mass spectrometry. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[6]  Christoph H Borchers,et al.  Multi-site assessment of the precision and reproducibility of multiple reaction monitoring–based measurements of proteins in plasma , 2009, Nature Biotechnology.

[7]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[8]  P. Neiman,et al.  An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes , 1996, Molecular and cellular biology.

[9]  Ronald J. Moore,et al.  Effectiveness of CID, HCD, and ETD with FT MS/MS for degradomic-peptidomic analysis: comparison of peptide identification methods. , 2011, Journal of proteome research.

[10]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[11]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[12]  Vineet Bafna,et al.  InsPecT : Fast and accurate identification of post-translationally modified peptides from tandem mass spectra , 2005 .

[13]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[14]  J. Yates,et al.  Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. , 2003, Analytical chemistry.

[15]  Mark P. Molloy,et al.  How specific is my SRM?: The issue of precursor and product ion redundancy , 2009, Proteomics.

[16]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.