Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction

BackgroundTandem mass spectrometry (MS/MS) has become a standard method for identification of proteins extracted from biological samples but the huge number and the noise contamination of MS/MS spectra obstruct swift and reliable computer-aided interpretation. Typically, a minor fraction of the spectra per sample (most often, only a few %) and about 10% of the peaks per spectrum contribute to the final result if protein identification is not prevented by the noise at all.ResultsTwo fast preprocessing screens can substantially reduce the haystack of MS/MS data. (1) Simple sequence ladder rules remove spectra non-interpretable in peptide sequences. (2) Modified Fourier-transform-based criteria clear background in the remaining data. In average, only a remainder of 35% of the MS/MS spectra (each reduced in size by about one quarter) has to be handed over to the interpretation software for reliable protein identification essentially without loss of information, with a trend to improved sequence coverage and with proportional decrease of computer resource consumption.ConclusionsThe search for sequence ladders in tandem MS/MS spectra with subsequent noise suppression is a promising strategy to reduce the number of MS/MS spectra from electro-spray instruments and to enhance the reliability of protein matches. Supplementary material and the software are available from an accompanying WWW-site with the URL http://mendel.bii.a-star.edu.sg/mass-spectrometry/MSCleaner-2.0/.

[1]  Christopher S. Oehmen,et al.  A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics , 2008, Bioinform..

[2]  Catherine Fenselau,et al.  Isotopic distributions in mass spectra of large molecules , 1983 .

[3]  Keh-Shew Lu,et al.  DIGITAL FILTER DESIGN , 1973 .

[4]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[5]  K. Biemann,et al.  A graphics display-oriented strategy for the amino acid sequencing of peptides by tandem mass spectrometry , 1987 .

[6]  B. Friedlander,et al.  The Modified Yule-Walker Method of ARMA Spectral Estimation , 1984, IEEE Transactions on Aerospace and Electronic Systems.

[7]  A. F. Neuwald,et al.  Differential Contributions of Condensin I and Condensin II to Mitotic Chromosome Architecture in Vertebrate Cells , 2003, Cell.

[8]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[9]  Alan L. Rockwood,et al.  Relationship of Fourier transforms to isotope distribution calculations , 1995 .

[10]  Mikhail M Savitski,et al.  New Data Base-independent, Sequence Tag-based Scoring of Peptide MS/MS Data Validates Mowse Scores, Recovers Below Threshold Data, Singles Out Modified Peptides, and Assesses the Quality of MS/MS Techniques* , 2005, Molecular & Cellular Proteomics.

[11]  F Hillenkamp,et al.  Matrix-assisted laser desorption/ionization mass spectrometry of biopolymers. , 1991, Analytical chemistry.

[12]  K. Mechtler,et al.  Automated, on‐line two‐dimensional nano liquid chromatography tandem mass spectrometry for rapid analysis of complex protein digests , 2004, Proteomics.

[13]  Tero Aittokallio,et al.  Quality classification of tandem mass spectrometry data , 2006, Bioinform..

[14]  B. Reinhold,et al.  Electrospray ionization mass spectrometry: Deconvolution by an Entropy-Based algorithm , 1992, Journal of the American Society for Mass Spectrometry.

[15]  F. McLafferty,et al.  Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Rovshan G Sadygov,et al.  Code developments to improve the efficiency of automated MS/MS spectra interpretation. , 2002, Journal of proteome research.

[17]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[18]  M. Mann,et al.  Interpreting mass spectra of multiply charged ions , 1989 .

[19]  K. Biemann,et al.  Computer program (SEQPEP) to aid in the interpretation of high-energy collision tandem mass spectra of peptides. , 1989, Biomedical & environmental mass spectrometry.

[20]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[21]  Nuno Bandeira,et al.  Protein identification by spectral networks analysis. , 2011, Methods in molecular biology.

[22]  S. Bryant,et al.  Assessing data quality of peptide mass spectra obtained by quadrupole ion trap mass spectrometry. , 2005, Journal of proteome research.

[23]  Adam Buciński,et al.  Artificial neural network analysis for evaluation of peptide MS/MS spectra in proteomics. , 2004, Analytical chemistry.

[24]  E. Kolker,et al.  Spectral quality assessment for high-throughput tandem mass spectrometry proteomics. , 2004, Omics : a journal of integrative biology.

[25]  Karl Mechtler,et al.  An improved method for tracking and reducing the void volume in nano HPLC–MS with micro trapping columns , 2003, Analytical and bioanalytical chemistry.

[26]  F. McLafferty,et al.  High-resolution electrospray mass spectra of large molecules , 1991 .

[27]  Zhongqi Zhang,et al.  A universal algorithm for fast and automated charge state deconvolution of electrospray mass-to-charge ratio spectra , 1998, Journal of the American Society for Mass Spectrometry.

[28]  F. M. Yeong,et al.  Identification of a Subunit of a Novel Kleisin-β/SMC Complex as a Potential Substrate of Protein Phosphatase 2A , 2003, Current Biology.

[29]  M. Mann,et al.  Proteomics to study genes and genomes , 2000, Nature.

[30]  Sebastian Maurer-Stroh,et al.  Kleisins: a superfamily of bacterial and eukaryotic SMC protein partners. , 2003, Molecular cell.

[31]  P. Kearney,et al.  MSMS Peak Identification and its Applications , 2004 .

[32]  M. Wilm,et al.  Analytical properties of the nanoelectrospray ion source. , 1996, Analytical chemistry.

[33]  J. Fenn,et al.  Electrospray ion source: another variation on the free-jet theme , 1984 .

[34]  A. Rockwood,et al.  Ultrahigh-speed calculation of isotope distributions. , 1996, Analytical chemistry.

[35]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[36]  R. Whittal,et al.  Interferences and contaminants encountered in modern mass spectrometry. , 2008, Analytica chimica acta.

[37]  C. Bartels Fast algorithm for peptide sequencing by mass spectroscopy. , 1990, Biomedical & environmental mass spectrometry.

[38]  J R Yates,et al.  Protein sequencing by tandem mass spectrometry. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[39]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[40]  John Skilling,et al.  Maximum entropy deconvolution in electrospray mass spectrometry , 1991 .

[41]  J. Ellenberg,et al.  Distinct functions of condensin I and II in mitotic chromosome assembly , 2004, Journal of Cell Science.

[42]  J. Yates,et al.  Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level. , 1997, Analytical chemistry.

[43]  A. Podtelejnikov,et al.  Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[44]  M. Wehofsky,et al.  Automated deconvolution and deisotoping of electrospray mass spectra. , 2002, Journal of mass spectrometry : JMS.

[45]  C. Rader,et al.  A new principle for fast Fourier transformation , 1976 .

[46]  V. Wysocki,et al.  Mobile and localized protons: a framework for understanding peptide dissociation. , 2000, Journal of mass spectrometry : JMS.

[47]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[48]  Hon Wai Leong,et al.  Algorithm for peptide sequencing by tandem mass spectrometry based on better preprocessing and anti-symmetric computational model. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[49]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[50]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[51]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[52]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[53]  T. Köcher,et al.  Preprocessing of tandem mass spectrometric data to support automatic protein identification , 2003, Proteomics.

[54]  P. Roepstorff,et al.  Proposal for a common nomenclature for sequence ions in mass spectra of peptides. , 1984, Biomedical mass spectrometry.

[55]  Karl Mechtler,et al.  Cleaning of raw peptide MS/MS spectra: Improved protein identification following deconvolution of multiply charged peaks, isotope clusters, and removal of background noise , 2006, Proteomics.

[56]  J. A. Taylor,et al.  Searching sequence databases via De novo peptide sequencing by tandem mass spectrometry , 2002, Molecular biotechnology.

[57]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.

[58]  J. Yates,et al.  Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. , 1995, Analytical chemistry.