PTMProphet: Fast and Accurate Mass Modification Localization for the Trans-Proteomic Pipeline

Spectral matching sequence database search engines commonly used on mass spectrometry-based proteomics experiments excel at identifying peptide sequence ions, and in addition, possible sequence ions carrying post-translational modifications (PTMs), but most do not provide confidence metrics for the exact localization of those PTMs when several possible sites are available. Localization is absolutely required for downstream molecular cell biology analysis of PTM function in vitro and in vivo. Therefore, we developed PTMProphet, a free and open-source software tool integrated into the Trans-Proteomic Pipeline, which reanalyzes identified spectra from any search engine for which pepXML output is available to provide localization confidence to enable appropriate further characterization of biologic events. Localization of any type of mass modification (e.g., phosphorylation) is supported. PTMProphet applies Bayesian mixture models to compute probabilities for each site/peptide spectrum match where a PTM has been identified. These probabilities can be combined to compute a global false localization rate at any threshold to guide downstream analysis. We describe the PTMProphet tool, its underlying algorithms and demonstrate its performance on ground-truth synthetic peptide reference datasets, one previously published small dataset, one new larger dataset, and also on a previously published phospho-enriched dataset where the correct sites of modification are unknown. Data have been deposited to ProteomeXchange with identifier PXD013210.

[1]  Patrice Duroux,et al.  IMGT®, the international ImMunoGeneTics information system® 25 years on , 2014, Nucleic Acids Res..

[2]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.

[3]  Luis Mendoza,et al.  Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics. , 2016, Journal of proteome research.

[4]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[5]  M. Eisenacher,et al.  Comparison of alternative MS/MS and bioinformatics approaches for confident phosphorylation site localization. , 2014, Journal of proteome research.

[6]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[7]  Yan Fu,et al.  PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome* , 2018, Molecular & Cellular Proteomics.

[8]  Henry H. N. Lam,et al.  Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. , 2008, Physiological genomics.

[9]  Peter R Baker,et al.  Modification Site Localization Scoring Integrated into a Search Engine* , 2011, Molecular & Cellular Proteomics.

[10]  Hyungwon Choi,et al.  LuciPHOr2: site localization of generic post-translational modifications from tandem mass spectrometry data , 2015, Bioinform..

[11]  Kaijie Xiao,et al.  Accurate phosphorylation site localization using phospho-brackets. , 2017, Analytica chimica acta.

[12]  B. Kuster,et al.  Confident Phosphorylation Site Localization Using the Mascot Delta Score , 2010, Molecular & Cellular Proteomics.

[13]  Stefani N. Thomas,et al.  PhosphoScan: a probability-based method for phosphorylation site prediction using MS2/MS3 pair information. , 2008, Journal of proteome research.

[14]  Natalie I. Tasman,et al.  iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates* , 2011, Molecular & Cellular Proteomics.

[15]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[16]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[17]  Harald Barsnes,et al.  The mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics* , 2017, Molecular & Cellular Proteomics.

[18]  Lan Huang,et al.  Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer , 2005, Molecular & Cellular Proteomics.

[19]  Martin Eisenacher,et al.  The PRIDE database and related tools and resources in 2019: improving support for quantification data , 2018, Nucleic Acids Res..

[20]  Andrew R. Jones,et al.  Evaluation of Parameters for Confident Phosphorylation Site Localization Using an Orbitrap Fusion Tribrid Mass Spectrometer. , 2017, Journal of proteome research.

[21]  Luis Mendoza,et al.  Trans‐Proteomic Pipeline, a standardized data processing pipeline for large‐scale reproducible proteomics informatics , 2015, Proteomics. Clinical applications.

[22]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[23]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[24]  Daniel MacLean,et al.  PhosCalc: A tool for evaluating the sites of peptide phosphorylation from Mass Spectrometer data , 2008, BMC Research Notes.

[25]  Martin Zeller,et al.  SLoMo: automated site localization of modifications from ETD/ECD mass spectra. , 2009, Journal of proteome research.

[26]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[27]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[28]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[29]  Hyungwon Choi,et al.  LuciPHOr: Algorithm for Phosphorylation Site Localization with False Localization Rate Estimation Using Modified Target-Decoy Approach* , 2013, Molecular & Cellular Proteomics.

[30]  Chris Sander,et al.  Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome , 2016, Cell.

[31]  Peter R Baker,et al.  In-depth Analysis of Tandem Mass Spectrometry Data from Disparate Instrument Types*S , 2008, Molecular & Cellular Proteomics.

[32]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[33]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[34]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[35]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[36]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[37]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[38]  T. Köcher,et al.  Universal and confident phosphorylation site localization using phosphoRS. , 2011, Journal of proteome research.

[39]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[40]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[41]  Alexey I Nesvizhskii,et al.  MSFragger: ultrafast and comprehensive peptide identification in shotgun proteomics , 2017, Nature Methods.

[42]  Hao Chi,et al.  pSite: Amino Acid Confidence Evaluation for Quality Control of De Novo Peptide Sequencing and Modification Site Localization. , 2018, Journal of proteome research.

[43]  Susumu Y. Imanishi,et al.  Phosphoproteomics to Characterize Host Response During Influenza A Virus Infection of Human Macrophages* , 2016, Molecular & Cellular Proteomics.

[44]  James C. Wright,et al.  Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation☆ , 2014, Journal of proteomics.

[45]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[46]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[47]  K. Clauser,et al.  Modification Site Localization Scoring: Strategies and Performance , 2012, Molecular & Cellular Proteomics.