Rapid and Accurate Generation of Peptide Sequence Tags with a Graph Search Approach

Protein peptide identification from a tandem mass spectrum (MS/MS) is a challenging task. Previous approaches for peptide identification with database search are time consuming due to huge search space. De novo sequencing approaches which derive a peptide sequence directly from a MS/MS spectrum usually are of high complexities and the accuracies of the approaches highly depend on the quality of the spectra. In this paper, we developed an accurate and efficient algorithm for peptide identification. Our work consisted of the following steps. Firstly, we found a pair of complementary mass peaks that are b-ion and y-ion, respectively. We then used the two mass peaks as two tree nodes and extend the trees such that in the end the nodes of the trees are elements of a b-ion set and a yion set, respectively. Secondly, we applied breadth first search to the trees to generate peptide sequence tags. Finally, we designed a weight function to evaluate the reliabilities of the tags and rank the tags. Our experiment on 2620 experimental MS/MS spectra with one PTM showed that our algorithm achieved better accuracy than other approaches with higher efficiency.

[1]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[2]  Pavel A. Pevzner,et al.  Peptide sequence tags for fast database search in mass-spectrometry. , 2005 .

[3]  Bin Ma,et al.  SPIDER: software for protein identification from sequence tags with de novo sequencing error , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[4]  Wen Gao,et al.  Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences , 2009, BMC Bioinformatics.

[5]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[6]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[7]  M. Mann,et al.  Proteomic analysis of post-translational modifications , 2003, Nature Biotechnology.

[8]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[9]  Christopher T. Walsh,et al.  Posttranslational Modification of Proteins: Expanding Nature's Inventory , 2005 .

[10]  Dekel Tsur,et al.  Identification of post-translational modifications by blind search of mass spectra , 2005, Nature Biotechnology.

[11]  A Bairoch,et al.  High-throughput mass spectrometric discovery of protein post-translational modifications. , 1999, Journal of molecular biology.

[12]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[13]  Bo Yan,et al.  Fast De novo Peptide Sequencing and Spectral Alignment via Tree Decomposition , 2006, Pacific Symposium on Biocomputing.

[14]  Bo Yan,et al.  A Point-Process Model for Rapid Identification of Post-Translational Modifications , 2006, Pacific Symposium on Biocomputing.

[15]  Bo Yan,et al.  Peptide sequence tag-based blind identification of post-translational modifications with point process model , 2006, ISMB.

[16]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[17]  B. Searle,et al.  High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. , 2004, Analytical chemistry.

[18]  Bo Yan,et al.  A graph-theoretic approach for the separation of b and y ions in tandem mass spectra , 2005, Bioinform..

[19]  Andreas Tauch,et al.  EMMA 2 – A MAGE-compliant system for the collaborative analysis and integration of microarray data , 2009, BMC Bioinformatics.