Confidence assessment for protein identification by using peptide‐mass fingerprinting data

Protein identification using Peptide Mass Fingerprinting (PMF) data remains an important yet only partially solved problem. Current computational methods may lead to false positive identification since the top hit from a database search may not be the target protein. In addition, the identification scores assigned singly by a scoring function (raw scores) are not normalized. Therefore, the ranking based on raw scores may be biased. To address the above issue, we have developed a statistical model to evaluate the confidence of the raw score and to improve the ranking of proteins for identification. The results show that the statistical model better ranks the correct protein than the raw scores. Our study provides a new method to enhance the accuracy of protein identification by using PMF data. We incorporated the method into our software package “Protein‐Decision” together with a user‐friendly graphical interface. A standalone version of Protein‐Decision is freely available at http://digbio.missouri.edu/ProteinDecision/.

[1]  David Fenyö,et al.  A model of random mass‐matching and its use for automated significance testing in mass spectrometric proteome analysis , 2002, Proteomics.

[2]  Roman A. Zubarev,et al.  Accurate Monoisotopic Mass Measurements of Peptides: Possibilities and Limitations of High Resolution Time-of-flight Particle Desorption Mass Spectrometry , 1996 .

[3]  Laurent Brechenmacher,et al.  Development and assessment of scoring functions for protein identification using PMF data , 2007, Electrophoresis.

[4]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[5]  Beat Kleiner,et al.  Graphical Methods for Data Analysis , 1983 .

[6]  R D Appel,et al.  Improving protein identification from peptide mass fingerprinting through a parameterized multi‐level scoring algorithm and an optimized peak detection , 1999, Electrophoresis.

[7]  J. Michael Ramsey,et al.  Transportable real-time single-particle ion trap mass spectrometer , 2005 .

[8]  Lennart Kenne,et al.  Method for differential detection and identification of components in protein mixtures analyzed by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. , 2004, Rapid communications in mass spectrometry : RCM.

[9]  Robertson Craig,et al.  The use of proteotypic peptide libraries for protein identification. , 2005, Rapid communications in mass spectrometry : RCM.

[10]  Fredrik Levander,et al.  Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting , 2004, Bioinform..

[11]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[12]  David Fenyö,et al.  Probity: a protein identification algorithm with accurate assignment of the statistical significance of the results. , 2004, Journal of proteome research.

[13]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.

[14]  Griet Debyser,et al.  MALDI‐TOF/TOF de novo sequence analysis of 2‐D PAGE‐separated proteins from Halorhodospira halophila, a bacterium with unsequenced genome , 2006, Electrophoresis.

[15]  Filippo Rusconi,et al.  Desktop prediction/analysis of mass spectrometric data in proteomic projects by using massXpert , 2002, Bioinform..

[16]  Ming-Cheng Luo,et al.  GenoProfiler: batch processing of high-throughput capillary fingerprinting data , 2007, Bioinform..

[17]  M. Mann,et al.  Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues*S , 2004, Molecular & Cellular Proteomics.

[18]  Liang Li,et al.  Investigation of the applicability of a sequential digestion protocol using trypsin and leucine aminopeptidase M for protein identification by matrix‐assisted laser desorption/ionization – time of flight mass spectrometry , 2001, Proteomics.

[19]  Assaf Wool,et al.  Precalibration of matrix‐assisted laser desorption/ionization‐time of flight spectra for peptide mass fingerprinting , 2002, Proteomics.

[20]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[21]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[22]  Daniel P. Miranker,et al.  A fast coarse filtering method for peptide identification by mass spectrometry , 2006, Bioinform..

[23]  Conrad Bessant,et al.  Protein and peptide identification algorithms using MS for use in high‐throughput, automated pipelines , 2005, Proteomics.

[24]  R. Zubarev,et al.  An A priori relationship between the average and monoisotopic masses of peptides and oligonucleotides , 1991 .

[25]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[26]  Christoph Menzel,et al.  OLAV-PMF: a novel scoring scheme for high-throughput peptide mass fingerprinting. , 2004, Journal of proteome research.

[27]  Gary Stacey,et al.  Proteomic analysis of soybean root hairs after infection by Bradyrhizobium japonicum. , 2005, Molecular plant-microbe interactions : MPMI.

[28]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[29]  David Meintrup,et al.  A statistical model providing comprehensive predictions for the mRNA differential display , 2005, Bioinform..

[30]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.