论文信息 - When less can yield more – Computational preprocessing of MS/MS spectra for peptide identification

When less can yield more – Computational preprocessing of MS/MS spectra for peptide identification

The effectiveness of database search algorithms, such as Mascot, Sequest and ProteinPilot is limited by the quality of the input spectra: spurious peaks in MS/MS spectra can jeopardize the correct identification of peptides or reduce their score significantly. Consequently, an efficient preprocessing of MS/MS spectra can increase the sensitivity of peptide identification at reduced file sizes and run time without compromising its specificity. We investigate the performance of 25 MS/MS preprocessing methods on various data sets and make software for improved preprocessing of mgf/dta‐files freely available from http://hci.iwr.uni‐heidelberg.de/mip/proteomics or http://www.childrenshospital.org/research/steenlab.

[1] Chris F. Taylor,et al. A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[2] Leo C. McHugh,et al. Computational Methods for Protein Identification from Mass Spectrometry Data , 2008, PLoS Comput. Biol..

[3] B. Balgley,et al. Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy*S , 2007, Molecular & Cellular Proteomics.

[4] Karl Mechtler,et al. Cleaning of raw peptide MS/MS spectra: Improved protein identification following deconvolution of multiply charged peaks, isotope clusters, and removal of background noise , 2006, Proteomics.

[5] Eunok Paek,et al. Quality assessment of tandem mass spectra based on cumulative intensity normalization. , 2006, Journal of proteome research.

[6] Alexey I Nesvizhskii,et al. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[7] Hanno Steen,et al. Robust prediction of the MASCOT score for an improved quality assessment in mass spectrometric proteomics. , 2008, Journal of proteome research.

[8] D. N. Perkins,et al. Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[9] Tero Aittokallio,et al. Filtering strategies for improving protein identification in high‐throughput MS/MS studies , 2009, Proteomics.

[10] Robert J Chalkley,et al. Mass Spectrometric Analysis of Protein Mixtures at Low Levels Using Cleavable 13C-Isotope-coded Affinity Tag and Multidimensional Chromatography* , 2003, Molecular & Cellular Proteomics.

[11] T. Hubbard,et al. Comparison of Mascot and X!Tandem Performance for Low and High Accuracy Mass Spectrometry and the Development of an Adjusted Mascot Threshold*S , 2008, Molecular & Cellular Proteomics.

[12] Bernhard Y. Renard,et al. NITPICK: peak identification for mass spectrometry data , 2008, BMC Bioinformatics.

[13] Alexey I Nesvizhskii,et al. Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[14] Albert Sickmann,et al. Extractor for ESI quadrupole TOF tandem MS data enabled for high throughput batch processing , 2004, BMC Bioinformatics.

[15] Navdeep Jaitly,et al. DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra , 2008, Bioinform..

[16] Charles Darwin,et al. Experiments , 1800, The Medical and physical journal.

[17] E. Birney,et al. The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[18] P. Pevzner,et al. InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[19] Waltraud X. Schulze,et al. A Novel Proteomic Screen for Peptide-Protein Interactions* , 2004, Journal of Biological Chemistry.

[20] S. Bryant,et al. Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[21] Hiroaki Kitano,et al. The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[22] W. McDonald,et al. MS2Grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra , 2005, Journal of the American Society for Mass Spectrometry.

[23] T. Köcher,et al. Preprocessing of tandem mass spectrometric data to support automatic protein identification , 2003, Proteomics.

[24] Sean L Seymour,et al. The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[25] M. Mann,et al. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[26] C. Ball,et al. Saccharomyces Genome Database. , 2002, Methods in enzymology.

[27] Tatiana Tatusova,et al. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[28] Sean L Seymour,et al. Nonlinear fitting method for determining local false discovery rates from decoy database searches. , 2008, Journal of proteome research.

[29] Rovshan G Sadygov,et al. Charger: combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra. , 2008, Analytical chemistry.

[30] J. Yates,et al. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[31] Richard D. Smith,et al. Clustering millions of tandem mass spectra. , 2008, Journal of proteome research.