Bioinformatics Methods for Protein Identification Using Peptide Mass Fingerprinting

Protein identification by mass spectrometry (MS) is an important technique in proteomics. By searching an MS spectrum against a given protein database, the most matched proteins are sorted using a scoring function and the top one is often considered the correctly identified protein. Peptide mass fingerprinting (PMF) is one of the major methods for protein identification using MS technology. It is faster and cheaper than the other popular technique - Tandem Mass Spectrometry. Key bioinformatics issues in PMF analysis include designing a scoring function to quantitatively measure the degree of consistency between a PMF spectrum and a protein sequence and assessing the confidence of identified proteins. In this chapter, we will introduce several scoring functions that were developed by others and us. We will also provide a new statistic model to evaluate the confidence of the score and make an improvement for ranking proteins in protein identification. Our developments have been implemented in a software package "ProteinDecision," which is available at http://digbio.missouri.edu/ProteinDecision/ .

[1]  F Hillenkamp,et al.  Matrix-assisted laser desorption/ionization mass spectrometry (MALDI) of endonuclease digests of RNA. , 1997, Nucleic acids research.

[2]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[3]  T Reichhardt,et al.  It's sink or swim as a tidal wave of data approaches , 1999, Nature.

[4]  Gary Stacey,et al.  Statistical assessment for mass-spec protein identification using peptide fingerprinting approach , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[5]  A. Podtelejnikov,et al.  Identification of the components of simple protein mixtures by high-accuracy peptide mass mapping and database searching. , 1997, Analytical chemistry.

[6]  D. Hochstrasser,et al.  Peptide mass fingerprinting peak intensity prediction: Extracting knowledge from spectra , 2002, Proteomics.

[7]  Irving John Good,et al.  C38. “Proper fees” in multiple-choice examinations , 1979 .

[8]  M. Gerstein,et al.  Comparing protein abundance and mRNA expression levels on a genomic scale , 2003, Genome Biology.

[9]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[10]  Christoph Menzel,et al.  OLAV-PMF: a novel scoring scheme for high-throughput peptide mass fingerprinting. , 2004, Journal of proteome research.

[11]  Habtom W. Ressom,et al.  Peak selection from MALDI-TOF mass spectra using ant colony optimization , 2007, Bioinform..

[12]  M. Mann,et al.  Proteomic analysis of post-translational modifications , 2003, Nature Biotechnology.

[13]  David Fenyö,et al.  Probity: a protein identification algorithm with accurate assignment of the statistical significance of the results. , 2004, Journal of proteome research.

[14]  Tao Liu,et al.  Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models , 2008, Bioinform..

[15]  Lennart Kenne,et al.  Method for differential detection and identification of components in protein mixtures analyzed by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. , 2004, Rapid communications in mass spectrometry : RCM.

[16]  T. Köcher,et al.  Preprocessing of tandem mass spectrometric data to support automatic protein identification , 2003, Proteomics.

[17]  Nan Hu,et al.  Esophageal Squamous Cell Cancer-Specific Protein Markers , 2001 .

[18]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[19]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[20]  W. E. Stephens,et al.  A Pulsed Mass Spectrometer with Time Dispersion , 1953 .

[21]  Knut Reinert,et al.  High-Accuracy Peak Picking of Proteomics Data Using Wavelet Techniques , 2005, Pacific Symposium on Biocomputing.

[22]  Alexandra Poulovassilis,et al.  Proteome Data Integration: Characteristics and Challenges , 2005 .

[23]  P. Højrup,et al.  VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins. , 2005, Journal of proteome research.

[24]  A. Marshall,et al.  Fourier Transform Ion Cyclotron Resonance Spectroscopy , 1974 .

[25]  Mark Gerstein,et al.  Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts , 2002, Bioinform..

[26]  K. Parker Scoring methods in MALDI peptide mass fingerprinting: ChemScore, and the ChemApplex program , 2002, Journal of the American Society for Mass Spectrometry.

[27]  Angelika Görg,et al.  Comparison of yeast cell protein solubilization procedures for two‐dimensional electrophoresis , 1999, Electrophoresis.

[28]  M. Gerstein,et al.  Analysis of yeast protein kinases using protein chips , 2000, Nature Genetics.

[29]  J. Derisi,et al.  Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise , 2006, Nature.

[30]  Bin Ma,et al.  PEAKS: Powerful Software for Peptide De Novo Sequencing by MS/MS , 2003 .

[31]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[32]  Prof. Dr. Borivoj Keil Specificity of Proteolysis , 1992, Springer Berlin Heidelberg.

[33]  T. Rejtar,et al.  Increased identification of peptides by enhanced data processing of high-resolution MALDI TOF/TOF mass spectra prior to database searching. , 2004, Analytical chemistry.

[34]  Mukta Gupta,et al.  New frontiers in proteomics research: a perspective. , 2005, International journal of pharmaceutics.

[35]  B. deFinetti,et al.  METHODS FOR DISCRIMINATING LEVELS OF PARTIAL KNOWLEDGE CONCERNING A TEST ITEM. , 1965, The British journal of mathematical and statistical psychology.

[36]  J. ThomsonJ.,et al.  Rays of Positive Electricity. , 1922 .

[37]  R. March,et al.  Quadrupole ion trap mass spectrometry: a view at the turn of the century , 2000 .

[38]  Laurent Brechenmacher,et al.  Development and assessment of scoring functions for protein identification using PMF data , 2007, Electrophoresis.

[39]  Assaf Wool,et al.  Precalibration of matrix‐assisted laser desorption/ionization‐time of flight spectra for peptide mass fingerprinting , 2002, Proteomics.

[40]  Liang Li,et al.  Investigation of the applicability of a sequential digestion protocol using trypsin and leucine aminopeptidase M for protein identification by matrix‐assisted laser desorption/ionization – time of flight mass spectrometry , 2001, Proteomics.

[41]  Nan Hu,et al.  2D Differential In-gel Electrophoresis for the Identification of Esophageal Scans Cell Cancer-specific Protein Markers* , 2002, Molecular & Cellular Proteomics.

[42]  John S Cottrell,et al.  Extending the mass range of a sector mass spectrometer , 1986 .

[43]  M. Mann,et al.  Electrospray ionization for mass spectrometry of large biomolecules. , 1989, Science.

[44]  L. Hedstrom Serine protease mechanism and specificity. , 2002, Chemical reviews.

[45]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[46]  Frances E. Allen,et al.  The History of Language Processor Technology in IBM , 1981, IBM J. Res. Dev..

[47]  J. Michael Ramsey,et al.  Transportable real-time single-particle ion trap mass spectrometer , 2005 .

[48]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[49]  Eytan Ruppin,et al.  Determinants of Protein Abundance and Translation Efficiency in S. cerevisiae , 2007, PLoS Comput. Biol..

[50]  A. Marshall,et al.  Fourier transform ion cyclotron resonance mass spectrometry: a primer. , 1998, Mass spectrometry reviews.

[51]  J. Crowley Introduction to proteomics: Tools for the new biology , 2002 .

[52]  Keith Richardson,et al.  Noise filtering techniques for electrospray quadrupole time of flight mass spectra , 2003, Journal of the American Society for Mass Spectrometry.

[53]  Dong Xu,et al.  Confidence assessment for protein identification by using peptide‐mass fingerprinting data , 2009, Proteomics.

[54]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[55]  David Fenyö,et al.  Optimizing search conditions for the mass fingerprint‐based identification of proteins , 2006, Proteomics.

[56]  David Fenyö,et al.  A model of random mass‐matching and its use for automated significance testing in mass spectrometric proteome analysis , 2002, Proteomics.

[57]  Hermann Wollnik,et al.  Time‐of‐flight mass analyzers , 1993 .

[58]  Paolo Penna,et al.  AuDeNS: a tool for automatic De Novo peptide sequencing , 2002 .

[59]  P. O’Farrell High resolution two-dimensional electrophoresis of proteins. , 1975, The Journal of biological chemistry.