Universal Metrics for Quality Assessment of Protein Identifications by Mass Spectrometry*

Increasing numbers of large proteomic datasets are becoming available. As attempts are made to interpret these datasets and integrate them with other forms of genomic data, researchers are becoming more aware of the importance of data quality with respect to protein identification. We present three simple and universal metrics that describe different aspects of the quality of protein identifications by peptide mass fingerprinting. Hit ratio gives an indication of the signal-to-noise ratio in a mass spectrum, mass coverage measures the amount of protein sequence matched, and excess of limit-digested peptides reflects the completeness of the digestion that precedes the peptide mass fingerprinting. Receiver-operating characteristic plots show that the novel metric, excess of limit-digested peptides, can discriminate between correct and random matches more accurately than search score when validating the results from a state-of-the-art protein identification software system (Mascot) especially when combined with the two other metrics, hit ratio and mass coverage. Recommendations are made regarding the use of the metrics when reporting protein identification experiments.

[1]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[2]  Carol S. Giometti,et al.  Identification of 2D-gel proteins: A comparison of MALDI/TOF peptide mass mapping to μ LC-ESI tandem mass spectrometry , 2003, Journal of the American Society for Mass Spectrometry.

[3]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[4]  H. Lehrach,et al.  A calibration method that simplifies and improves accurate determination of peptide molecular masses by MALDI-TOF MS. , 2002, Analytical chemistry.

[5]  Ken Haynes,et al.  Proteomic changes associated with inactivation of the Candida glabrata ACE2 virulence‐moderating gene , 2005, Proteomics.

[6]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[7]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[8]  K. Stühler,et al.  Evaluation of algorithms for protein identification from sequence databases using mass spectrometry data , 2004, Proteomics.

[9]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[10]  Alistair J. P. Brown,et al.  PEDRo: A database for storing, searching and disseminating experimental proteomics data , 2004, BMC Genomics.

[11]  Ruedi Aebersold,et al.  The Need for Guidelines in Publication of Peptide and Protein Identification Data , 2004, Molecular & Cellular Proteomics.

[12]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[13]  L B Lusted,et al.  Signal detectability and medical decision-making. , 1971, Science.