Interpretation of mass spectrometry data for high-throughput proteomics

Recent developments in proteomics have revealed a bottleneck in bioinformatics: high-quality interpretation of acquired MS data. The ability to generate thousands of MS spectra per day, and the demand for this, makes manual methods inadequate for analysis and underlines the need to transfer the advanced capabilities of an expert human user into sophisticated MS interpretation algorithms. The identification rate in current high-throughput proteomics studies is not only a matter of instrumentation. We present software for high-throughput PMF identification, which enables robust and confident protein identification at higher rates. This has been achieved by automated calibration, peak rejection, and use of a meta search approach which employs various PMF search engines. The automatic calibration consists of a dynamic, spectral information-dependent algorithm, which combines various known calibration methods and iteratively establishes an optimised calibration. The peak rejection algorithm filters signals that are unrelated to the analysed protein by use of automatically generated and dataset-dependent exclusion lists. In the "meta search" several known PMF search engines are triggered and their results are merged by use of a meta score. The significance of the meta score was assessed by simulation of PMF identification with 10,000 artificial spectra resembling a data situation close to the measured dataset. By means of this simulation the meta score is linked to expectation values as a statistical measure. The presented software is part of the proteome database ProteinScape which links the information derived from MS data to other relevant proteomics data. We demonstrate the performance of the presented system with MS data from 1891 PMF spectra. As a result of automatic calibration and peak rejection the identification rate increased from 6% to 44%.

[1]  Kelvin H. Lee,et al.  Dynamical analysis of gene networks requires both mRNA and protein expression information. , 1999, Metabolic engineering.

[2]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[3]  M. Quadroni,et al.  Proteomics and automation , 2007, Electrophoresis.

[4]  D. Hochstrasser,et al.  Modeling peptide mass fingerprinting data using the atomic composition of peptides , 1999, Electrophoresis.

[5]  E. Nordhoff,et al.  Alpha-cyano-4-hydroxycinnamic acid affinity sample preparation. A protocol for MALDI-MS peptide analysis in proteomics. , 2001, Analytical chemistry.

[6]  J. Klose Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues , 1975, Humangenetik.

[7]  Assaf Wool,et al.  Precalibration of matrix‐assisted laser desorption/ionization‐time of flight spectra for peptide mass fingerprinting , 2002, Proteomics.

[8]  D. Lim,et al.  Evaluation of parameters in peptide mass fingerprinting for protein identification by MALDI-TOF mass spectrometry. , 2002, Molecules and cells.

[9]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[10]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[11]  P. Højrup,et al.  Use of mass spectrometric molecular weight information to identify proteins in sequence databases. , 1993, Biological mass spectrometry.

[12]  Joachim Klose,et al.  Two‐dimensional electrophoresis of proteins: An updated protocol and implications for a functional analysis of the genome , 1995, Electrophoresis.

[13]  D. Hochstrasser,et al.  Towards an automated approach for protein identification in proteome projects , 1998, Electrophoresis.

[14]  H. Lehrach,et al.  A calibration method that simplifies and improves accurate determination of peptide molecular masses by MALDI-TOF MS. , 2002, Analytical chemistry.

[15]  J. Seilhamer,et al.  A comparison of selected mRNA and protein abundances in human liver , 1997, Electrophoresis.

[16]  G. Gonnet,et al.  Protein identification by mass profile fingerprinting. , 1993, Biochemical and biophysical research communications.

[17]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[18]  P. Mortensen,et al.  Automation of matrix-assisted laser desorption/ionization mass spectrometry using fuzzy logic feedback control. , 1997, Analytical chemistry.

[19]  P. Roepstorff,et al.  Identification of proteins in polyacrylamide gels by mass spectrometric peptide mapping combined with database search. , 1994, Biological mass spectrometry.

[20]  T. Hunkapiller,et al.  Peptide mass maps: a highly informative approach to protein identification. , 1993, Analytical biochemistry.

[21]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[22]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[23]  Matthias Mann,et al.  Automated Protein Preparation Techniques Using a Digest Robot , 1997, Journal of protein chemistry.

[24]  P. O’Farrell High resolution two-dimensional electrophoresis of proteins. , 1975, The Journal of biological chemistry.

[25]  Kyong Sei Lee,et al.  Design and control of a spherical air-bearing system for multi-d.o.f. ball-joint-like actuators , 2003 .

[26]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[27]  K L Williams,et al.  The Australian proteome analysis facility (APAF): Assembling large scale proteomics through integration and automation , 1998, Electrophoresis.

[28]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[29]  R D Appel,et al.  Improving protein identification from peptide mass fingerprinting through a parameterized multi‐level scoring algorithm and an optimized peak detection , 1999, Electrophoresis.

[30]  Hans Lehrach,et al.  Large‐gel two‐dimensional electrophoresis‐matrix assisted laser desorption/ionization‐time of flight‐mass spectrometry: An analytical challenge for studying complex protein mixtures , 2001 .