New Data Base-independent, Sequence Tag-based Scoring of Peptide MS/MS Data Validates Mowse Scores, Recovers Below Threshold Data, Singles Out Modified Peptides, and Assesses the Quality of MS/MS Techniques*

The Mascot score (M-score) is one of the conventional validity measures in data base identification of peptides and proteins by MS/MS data. Although tremendously useful, M-score has a number of limitations. For the same MS/MS data, M-score may change if the protein data base is expanded. A low M-value may not necessarily mean poor match but rather poor MS/MS quality. In addition M-score does not fully utilize the advantage of combined use of complementary fragmentation techniques collisionally activated dissociation (CAD) and electron capture dissociation (ECD). To address these issues, a new data base-independent scoring method (S-score) was designed that is based on the maximum length of the peptide sequence tag provided by the combined CAD and ECD data. The quality of MS/MS spectra assessed by S-score allows poor data (39% of all MS/MS spectra) to be filtered out before the data base search, speeding up the data analysis and eliminating a major source of false positive identifications. Spectra with below threshold M-scores (poor matches) but high S-scores are validated. Spectra with zero M-score (no data base match) but high S-score are classified as belonging to modified sequences. As an extension of S-score, an extremely reliable sequence tag was developed based on complementary fragments simultaneously appearing in CAD and ECD spectra. Comparison of this tag with the data base-derived sequence gives the most reliable peptide identification validation to date. The combined use of M- and S-scoring provides positive sequence identification from >25% of all MS/MS data, a 40% improvement over traditional M-scoring performed on the same Fourier transform MS instrumentation. The number of proteins reliably identified from Escherichia coli cell lysate hereby increased by 29% compared with the traditional M-score approach. Finally S-scoring provides a quantitative measure of the quality of fragmentation techniques such as the minimum abundance of the precursor ion, the MS/MS of which gives the threshold S-score value of 2.

[1]  T. Köcher,et al.  Preprocessing of tandem mass spectrometric data to support automatic protein identification , 2003, Proteomics.

[2]  A. Marshall,et al.  Fourier transform ion cyclotron resonance detection: principles and experimental configurations , 2002 .

[3]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[4]  Viv Bewick,et al.  Statistics review 7: Correlation and regression , 2003, Critical care.

[5]  F W McLafferty,et al.  Biomolecule Mass Spectrometry , 1999, Science.

[6]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[7]  Mikhail M Savitski,et al.  Improving Protein Identification Using Complementary Fragmentation Techniques in Fourier Transform Mass Spectrometry* , 2005, Molecular & Cellular Proteomics.

[8]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.

[9]  John D. Venable,et al.  Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra , 2004, Nature Methods.

[10]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[11]  A. Shevchenko,et al.  Femtomole sequencing of proteins from polyacrylamide gels by nano-electrospray mass spectrometry , 1996, Nature.

[12]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.

[13]  F. McLafferty,et al.  Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[15]  D. Urry,et al.  Nonenzymatic deamidation of asparaginyl and glutaminyl residues in proteins. , 1991, Critical reviews in biochemistry and molecular biology.

[16]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[17]  M. Mann,et al.  Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[19]  M. Mann,et al.  Proteomics to study genes and genomes , 2000, Nature.

[20]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[21]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[22]  F. McLafferty,et al.  Electron capture dissociation for structural characterization of multiply charged protein cations. , 2000, Analytical chemistry.

[23]  F. McLafferty,et al.  Electron Capture Dissociation of Multiply Charged Protein Cations. A Nonergodic Process , 1998 .

[24]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[25]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.