PTMTreeSearch: a novel two-stage tree-search algorithm with pruning rules for the identification of post-translational modification of proteins in MS/MS spectra

MOTIVATION Tandem mass spectrometry has become a standard tool for identifying post-translational modifications (PTMs) of proteins. Algorithmic searches for PTMs from tandem mass spectrum data (MS/MS) tend to be hampered by noisy data as well as by a combinatorial explosion of search space. This leads to high uncertainty and long search-execution times. RESULTS To address this issue, we present PTMTreeSearch, a new algorithm that uses a large database of known PTMs to identify PTMs from MS/MS data. For a given peptide sequence, PTMTreeSearch builds a computational tree wherein each path from the root to the leaves is labeled with the amino acids of a peptide sequence. Branches then represent PTMs. Various empirical tree pruning rules have been designed to decrease the search-execution time by eliminating biologically unlikely solutions. PTMTreeSearch first identifies a relatively small set of high confidence PTM types, and in a second stage, performs a more exhaustive search on this restricted set using relaxed search parameter settings. An analysis of experimental data shows that using the same criteria for false discovery, PTMTreeSearch annotates more peptides than the current state-of-the-art methods and PTM identification algorithms, and achieves this at roughly the same execution time. PTMTreeSearch is implemented as a plugable scoring function in the X!Tandem search engine. AVAILABILITY The source code of PTMTreeSearch and a demo server application can be found at http://net.icgeb.org/ptmtreesearch

[1]  John S. Garavelli,et al.  The RESID Database of Protein Modifications: 2003 developments , 2003, Nucleic Acids Res..

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Dekel Tsur,et al.  Identification of post-translational modifications via blind search of mass-spectra , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[4]  Attila Kertész-Farkas,et al.  Database searching in mass spectrometry based proteomics , 2012 .

[5]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[6]  Richard J Jacob,et al.  Bioinformatics for LC-MS/MS-based proteomics. , 2010, Methods in molecular biology.

[7]  M. Myers,et al.  Conjugation of complex polyubiquitin chains to WRNIP1. , 2008, Journal of proteome research.

[8]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[9]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[10]  Henry H. N. Lam,et al.  Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. , 2008, Physiological genomics.

[11]  Bobbie-Jo M. Webb-Robertson,et al.  Current trends in computational inference from mass spectrometry-based proteomics , 2007, Briefings Bioinform..

[12]  Ravi Tharakan,et al.  Data maximization by multipass analysis of protein mass spectra , 2010, Proteomics.

[13]  Samuel H. Payne,et al.  Accurate annotation of peptide modifications through unrestrictive database search. , 2008, Journal of proteome research.

[14]  Hokeun Kim,et al.  MODi : a powerful and convenient web server for identifying multiple post-translational peptide modifications from tandem mass spectra , 2006, Nucleic Acids Res..

[15]  Michael J MacCoss,et al.  Computational analysis of shotgun proteomics data. , 2005, Current opinion in chemical biology.

[16]  S. Böcker,et al.  Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules , 2010, Analytical and Bioanalytical Chemistry.

[17]  A. Wool,et al.  Large-scale unrestricted identification of post-translation modifications using tandem mass spectrometry. , 2007, Analytical chemistry.

[18]  Markus Müller,et al.  Unrestricted identification of modified proteins using MS/MS , 2010, Proteomics.

[19]  B. Erickson,et al.  PTMSearchPlus: software tool for automated protein identification and post-translational modification characterization by integrating accurate intact protein mass and bottom-up mass spectrometric data searches. , 2009, Analytical chemistry.

[20]  András Kocsor,et al.  ROC analysis: applications to the classification of biological sequences and 3D structures , 2008, Briefings Bioinform..

[21]  Sándor Pongor,et al.  PTMSearch: A Greedy Tree Traversal Algorithm for Finding Protein Post-Translational Modifications in Tandem Mass Spectra , 2011, ECML/PKDD.

[22]  D. Creasy,et al.  Error tolerant searching of uninterpreted tandem mass spectrometry data , 2002, Proteomics.

[23]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[24]  Yingming Zhao,et al.  PTMap—A sequence alignment software for unrestricted, accurate, and full-spectrum identification of post-translational modification sites , 2009, Proceedings of the National Academy of Sciences.

[25]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[26]  Roger E. Moore,et al.  Qscore: An algorithm for evaluating SEQUEST database search results , 2002, Journal of the American Society for Mass Spectrometry.

[27]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[28]  B. Searle,et al.  High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. , 2004, Analytical chemistry.

[29]  Yan Fu,et al.  DeltAMT: A Statistical Algorithm for Fast Detection of Protein Modifications From LC-MS/MS Data* , 2011, Molecular & Cellular Proteomics.

[30]  John D. Venable,et al.  MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. , 2004, Rapid communications in mass spectrometry : RCM.

[31]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[32]  Ruixiang Sun,et al.  Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate , 2010, Bioinform..

[33]  R. Aebersold,et al.  An integrated workflow for charting the human interaction proteome: insights into the PP2A system , 2009, Molecular systems biology.

[34]  Jian Liu,et al.  Computational refinement of post-translational modifications predicted from tandem mass spectrometry , 2011, Bioinform..

[35]  Christodoulos A. Floudas,et al.  A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry* , 2010, Molecular & Cellular Proteomics.

[36]  G. Cagney,et al.  Sequential interval motif search: unrestricted database surveys of global MS/MS data sets for detection of putative post-translational modifications. , 2008, Analytical chemistry.

[37]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[38]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[39]  Dong-Joo Kim,et al.  Unrestricted identification of post translational modifications from tandem mass spectra datasets , 2010, 2010 International Conference on Bioinformatics and Biomedical Technology.

[40]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[41]  Donato Malerba,et al.  Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III , 2011 .

[42]  W. Van Criekinge,et al.  Peptidomics coming of age: a review of contributions from a bioinformatics angle. , 2010, Journal of proteome research.

[43]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[44]  J. A. Taylor,et al.  Informatics for protein identification by mass spectrometry. , 2005, Methods.

[45]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[46]  Maureen Kachman,et al.  Validated MALDI-TOF/TOF mass spectra for protein standards , 2007, Journal of the American Society for Mass Spectrometry.

[47]  D. Liebler,et al.  P-Mod: an algorithm and software to map modifications to peptide sequences using tandem MS data. , 2005, Journal of proteome research.

[48]  R. Aebersold,et al.  Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. , 2004, Drug discovery today.

[49]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[50]  John I. Clark,et al.  Shotgun identification of protein modifications from protein complexes and lens tissue , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  William Stafford Noble,et al.  Computational and Statistical Analysis of Protein Mass Spectrometry Data , 2012, PLoS Comput. Biol..

[52]  Christopher H. Becker,et al.  Recent developments in quantitative proteomics. , 2011, Mutation research.

[53]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[54]  Kei-Hoi Cheung,et al.  X!!Tandem, an improved method for running X!tandem in parallel on collections of commodity computers. , 2008, Journal of proteome research.