MassWiz: a novel scoring algorithm with target-decoy based analysis pipeline for tandem mass spectrometry.

Mass spectrometry has made rapid advances in the recent past and has become the preferred method for proteomics. Although many open source algorithms for peptide identification exist, such as X!Tandem and OMSSA, it has majorly been a domain of proprietary software. There is a need for better, freely available, and configurable algorithms that can help in identifying the correct peptides while keeping the false positives to a minimum. We have developed MassWiz, a novel empirical scoring function that gives appropriate weights to major ions, continuity of b-y ions, intensities, and the supporting neutral losses based on the instrument type. We tested MassWiz accuracy on 486,882 spectra from a standard mixture of 18 proteins generated on 6 different instruments downloaded from the Seattle Proteome Center public repository. We compared the MassWiz algorithm with Mascot, Sequest, OMSSA, and X!Tandem at 1% FDR. MassWiz outperformed all in the largest data set (AGILENT XCT) and was second only to Mascot in the other data sets. MassWiz showed good performance in the analysis of high confidence peptides, i.e., those identified by at least three algorithms. We also analyzed a yeast data set containing 106,133 spectra downloaded from the NCBI Peptidome repository and got similar results. The results demonstrate that MassWiz is an effective algorithm for high-confidence peptide identification without compromising on the number of assignments. MassWiz is open-source, versatile, and easily configurable.

[1]  Leo E Bonilla,et al.  Maximizing the sensitivity and reliability of peptide identification in large‐scale proteomic experiments by harnessing multiple search engines , 2010, Proteomics.

[2]  Gilbert S Omenn,et al.  An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and specificity analysis , 2005, Proteomics.

[3]  Ari Frank,et al.  Predicting intensity ranks of peptide fragment ions. , 2009, Journal of proteome research.

[4]  Tero Aittokallio,et al.  Filtering strategies for improving protein identification in high‐throughput MS/MS studies , 2009, Proteomics.

[5]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[6]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[7]  Tamanna Sultana,et al.  Optimization of the Use of Consensus Methods for the Detection and Putative Identification of Peptides via Mass Spectrometry Using Protein Standard Mixtures. , 2009, Journal of proteomics & bioinformatics.

[8]  John R Yates,et al.  Cleavage N-terminal to proline: analysis of a database of peptide tandem mass spectra. , 2003, Analytical chemistry.

[9]  V. Wysocki,et al.  Mobile and localized protons: a framework for understanding peptide dissociation. , 2000, Journal of mass spectrometry : JMS.

[10]  Rune Matthiesen,et al.  Extracting monoisotopic single-charge peaks from liquid chromatography-electrospray ionization-mass spectrometry. , 2007, Methods in molecular biology.

[11]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[12]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[13]  Stephen T. C. Wong,et al.  A novel peak detection approach with chemical noise removal using short‐time FFT for prOTOF MS data , 2009, Proteomics.

[14]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[15]  David L Tabb,et al.  Determination of peptide and protein ion charge states by fourier transformation of isotope-resolved mass spectra , 2006, Journal of the American Society for Mass Spectrometry.

[16]  John R Yates,et al.  Influence of basic residue content on fragment ion peak intensities in low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[17]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[18]  Morgan C. Giddings,et al.  Fragmentation characteristics of collision-induced dissociation in MALDI TOF/TOF mass spectrometry. , 2007, Analytical chemistry.

[19]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[20]  R. Cramer,et al.  The nature of collision-induced dissociation processes of doubly protonated peptides: comparative study for the future use of matrix-assisted laser desorption/ionization on a hybrid quadrupole time-of-flight mass spectrometer in proteomics. , 2001, Rapid communications in mass spectrometry : RCM.

[21]  M. Mann,et al.  The abc's (and xyz's) of peptide sequencing , 2004, Nature Reviews Molecular Cell Biology.

[22]  I. Eidhammer,et al.  Improving the reliability and throughput of mass spectrometry‐based proteomics by spectrum quality filtering , 2006, Proteomics.

[23]  Ruben K Dagda,et al.  Evaluation of the Consensus of Four Peptide Identification Algorithms for Tandem Mass Spectrometry Based Proteomics. , 2010, Journal of proteomics & bioinformatics.

[24]  Pablo Carbonell,et al.  InSilicoSpectro: an open-source proteomics library. , 2006, Journal of proteome research.

[25]  M. Karas,et al.  Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. , 1988, Analytical chemistry.

[26]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[27]  Joshua E. Elias,et al.  Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics , 2010, Proteome Bioinformatics.

[28]  Rovshan G Sadygov,et al.  Charger: combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra. , 2008, Analytical chemistry.

[29]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[30]  Roger E. Moore,et al.  Qscore: An algorithm for evaluating SEQUEST database search results , 2002, Journal of the American Society for Mass Spectrometry.

[31]  Keith Richardson,et al.  Noise filtering techniques for electrospray quadrupole time of flight mass spectra , 2003, Journal of the American Society for Mass Spectrometry.

[32]  Jennifer A Mead,et al.  Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets. , 2009, Journal of proteome research.

[33]  Z. Smilansky,et al.  Intensity-based statistical scorer for tandem mass spectrometry. , 2003, Analytical chemistry.

[34]  Eugene A. Kapp,et al.  Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation. , 2003, Analytical chemistry.

[35]  Heng Huang,et al.  Peak Detection in Mass Spectrometry by Gabor Filters and Envelope Analysis , 2009, J. Bioinform. Comput. Biol..

[36]  M. Mann,et al.  Phosphotyrosine interactome of the ErbB-receptor kinase family , 2005, Molecular systems biology.

[37]  M. MacCoss,et al.  High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. , 2007, Analytical chemistry.

[38]  Guanghui Wang,et al.  Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. , 2009, Analytical chemistry.

[39]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[40]  S. Suhai,et al.  Proton-driven amide bond-cleavage pathways of gas-phase peptide ions lacking mobile protons. , 2009, Journal of the American Chemical Society.

[41]  David L Tabb,et al.  MASPIC: intensity-based tandem mass spectrometry scoring scheme that improves peptide identification at high confidence. , 2005, Analytical chemistry.

[42]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[43]  M. Mann,et al.  Electrospray ionization for mass spectrometry of large biomolecules. , 1989, Science.

[44]  Sándor Suhai,et al.  Fragmentation pathways of protonated peptides. , 2005, Mass spectrometry reviews.

[45]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[46]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[47]  The statistical significance of protein identification results as a function of the number of protein sequences searched. , 2004, Journal of proteome research.

[48]  R. Aebersold,et al.  Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. , 2004, Drug discovery today.