PhosSA: Fast and accurate phosphorylation site assignment algorithm for mass spectrometry data

Phosphorylation site assignment of high throughput tandem mass spectrometry (LC-MS/MS) data is one of the most common and critical aspects of phosphoproteomics. Correctly assigning phosphorylated residues helps us understand their biological significance. The design of common search algorithms (such as Sequest, Mascot etc.) do not incorporate site assignment; therefore additional algorithms are essential to assign phosphorylation sites for mass spectrometry data. The main contribution of this study is the design and implementation of a linear time and space dynamic programming strategy for phosphorylation site assignment referred to as PhosSA. The proposed algorithm uses summation of peak intensities associated with theoretical spectra as an objective function. Quality control of the assigned sites is achieved using a post-processing redundancy criteria that indicates the signal-to-noise ratio properties of the fragmented spectra. The quality assessment of the algorithm was determined using experimentally generated data sets using synthetic peptides for which phosphorylation sites were known. We report that PhosSA was able to achieve a high degree of accuracy and sensitivity with all the experimentally generated mass spectrometry data sets. The implemented algorithm is shown to be extremely fast and scalable with increasing number of spectra (we report up to 0.5 million spectra/hour on a moderate workstation). The algorithm is designed to accept results from both Sequest and Mascot search engines. An executable is freely available at http://helixweb.nih.gov/ESBL/PhosSA/ for academic research purposes.

[1]  Markus M. Rinschen,et al.  Quantitative phosphoproteomic analysis reveals vasopressin V2-receptor–dependent signaling pathways in renal collecting duct cells , 2010, Proceedings of the National Academy of Sciences.

[2]  Steven P Gygi,et al.  Large-scale characterization of HeLa cell nuclear phosphoproteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ashfaq A. Khokhar,et al.  A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms , 2009, J. Parallel Distributed Comput..

[4]  Peter R Baker,et al.  Modification Site Localization Scoring Integrated into a Search Engine* , 2011, Molecular & Cellular Proteomics.

[5]  Edward L. Huttlin,et al.  Evaluation of HCD- and CID-type Fragmentation Within Their Respective Detection Platforms For Murine Phosphoproteomics* , 2011, Molecular & Cellular Proteomics.

[6]  Fahad Saeed,et al.  Dynamics of the G Protein-coupled Vasopressin V2 Receptor Signaling Network Revealed by Quantitative Phosphoproteomics* , 2011, Molecular & Cellular Proteomics.

[7]  Julian P Whitelegge,et al.  HPLC and mass spectrometry of intrinsic membrane proteins. , 2004, Methods in molecular biology.

[8]  C. Leslie,et al.  cPLA2 phosphorylation at serine-515 and serine-505 is required for arachidonic acid release in vascular smooth muscle cells Published, JLR Papers in Press, January 9, 2008. , 2008, Journal of Lipid Research.

[9]  Trairak Pisitkun,et al.  Quantitative phosphoproteomics of vasopressin-sensitive renal cells: regulation of aquaporin-2 phosphorylation at two sites. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Matthew E Monroe,et al.  Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications. , 2008, Journal of proteome research.

[11]  Guanghui Wang,et al.  An efficient dynamic programming algorithm for phosphorylation site assignment of large-scale mass spectrometry data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[12]  D. Lauffenburger,et al.  Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks , 2007, Proceedings of the National Academy of Sciences.

[13]  Fahad Saeed,et al.  An efficient algorithm for clustering of large-scale mass spectrometry data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[14]  Ingo K Mellinghoff,et al.  Tracing cancer networks with phosphoproteomics , 2010, Nature Biotechnology.

[15]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[16]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.

[17]  J. Yates,et al.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. , 2003, Analytical chemistry.

[18]  Éva Tardos,et al.  Algorithm design , 2005 .

[19]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[20]  M. MacCoss,et al.  A fast SEQUEST cross correlation algorithm. , 2008, Journal of proteome research.

[21]  B. Kuster,et al.  Confident Phosphorylation Site Localization Using the Mascot Delta Score , 2010, Molecular & Cellular Proteomics.

[22]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[23]  T. Köcher,et al.  Universal and confident phosphorylation site localization using phosphoRS. , 2011, Journal of proteome research.

[24]  John R Yates,et al.  Quantitative phosphoproteomic analysis of the tumor necrosis factor pathway. , 2006, Journal of proteome research.

[25]  Gregory Shakhnarovich,et al.  Discovery of phosphorylation motif mixtures in phosphoproteomics data , 2008, Bioinform..

[26]  Alan M. Moses,et al.  Evolution of characterized phosphorylation sites in budding yeast. , 2010, Molecular biology and evolution.

[27]  Brian E. Ruttenberg,et al.  PhosphoScore: an open-source phosphorylation site assignment tool for MSn data. , 2008, Journal of proteome research.

[28]  Martin Zeller,et al.  SLoMo: automated site localization of modifications from ETD/ECD mass spectra. , 2009, Journal of proteome research.

[29]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[30]  Richard D. Smith,et al.  Clustering millions of tandem mass spectra. , 2008, Journal of proteome research.

[31]  Nuno Bandeira,et al.  Protein identification by spectral networks analysis. , 2011, Methods in molecular biology.

[32]  C. Leslie,et al.  cPLA 2 phosphorylation at serine-515 and serine-505 is required for arachidonic acid release in vascular smooth muscle cells , 2008 .

[33]  Lennart Martens,et al.  OMSSA Parser: An open‐source library to parse and extract data from OMSSA MS/MS search results , 2009, Proteomics.

[34]  Fahad Saeed,et al.  A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes , 2012, J. Parallel Distributed Comput..

[35]  Samuel H Payne,et al.  Phosphorylation-specific MS/MS scoring for rapid and accurate phosphoproteome analysis. , 2008, Journal of proteome research.

[36]  J. Thomson,et al.  Human embryonic stem cell phosphoproteome revealed by electron transfer dissociation tandem mass spectrometry , 2009, Proceedings of the National Academy of Sciences.

[37]  Wen Gao,et al.  pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry , 2005, Bioinform..

[38]  Fahad Saeed,et al.  A high performance algorithm for clustering of large-scale protein mass spectrometry data using multi-core architectures , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[39]  Fahad Saeed,et al.  High performance phosphorylation site assignment algorithm for mass spectrometry data using multicore systems , 2012, BCB '12.

[40]  Scott A Gerber,et al.  Large-scale phosphorylation analysis of alpha-factor-arrested Saccharomyces cerevisiae. , 2007, Journal of proteome research.

[41]  M. Mann,et al.  Quantitative Phosphoproteomics Applied to the Yeast Pheromone Signaling Pathway*S , 2005, Molecular & Cellular Proteomics.

[42]  Fahad Saeed,et al.  CPhos: A program to calculate and visualize evolutionarily conserved functional phosphorylation sites , 2012, Proteomics.

[43]  H. Daub,et al.  Glycoprotein Capture and Quantitative Phosphoproteomics Indicate Coordinated Regulation of Cell Migration upon Lysophosphatidic Acid Stimulation* , 2010, Molecular & Cellular Proteomics.