Automated reprocessing pipeline for searching heterogeneous mass spectrometric data of the HUPO Brain Proteome Project pilot phase

The newly available techniques for sensitive proteome analysis and the resulting amount of data require a new bioinformatics focus on automatic methods for spectrum reprocessing and peptide/protein validation. Manual validation of results in such studies is not feasible and objective enough for quality relevant interpretation. The necessity for tools enabling an automatic quality control is, therefore, important to produce reliable and comparable data in such big consortia as the Human Proteome Organization Brain Proteome Project. Standards and well‐defined processing pipelines are important for these consortia. We show a way for choosing the right database model, through collecting data, processing these with a decoy database and end up with a quality controlled protein list merged from several search engines, including a known false‐positive rate.

[1]  R. Appel,et al.  Guidelines for the next 10 years of proteomics , 2009, Proteomics.

[2]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[3]  Christian Stephan,et al.  HUPO Brain Proteome Project Pilot Studies: Bioinformatics at Work , 2005, Proteomics.

[4]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[5]  Lennart Martens,et al.  DBToolkit: processing protein databases for peptide-centric proteomics , 2005, Bioinform..

[6]  Eugene Kolker,et al.  Randomized sequence databases for tandem mass spectrometry peptide and protein identification. , 2005, Omics : a journal of integrative biology.

[7]  Lennart Martens,et al.  5th HUPO BPP Bioinformatics Meeting at the European Bioinformatics Institute in Hinxton, UK – Setting the Analysis Frame , 2005, Proteomics.

[8]  Martin Blüggel,et al.  Hochdurchsatz Analyse in den Biowissenschaften durch die Nutzung von Service Oriented Clustering (Proliferating High Throughput Analysis in Life Science by using Service Oriented Clustering) , 2005, it Inf. Technol..

[9]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[12]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[13]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[14]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[15]  Kai A. Reidegeld,et al.  Towards data management of the HUPO Human Brain Proteome Project pilot phase , 2004, Proteomics.

[16]  Ruedi Aebersold,et al.  The Need for Guidelines in Publication of Peptide and Protein Identification Data , 2004, Molecular & Cellular Proteomics.

[17]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[18]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[19]  Joachim Klose,et al.  Interpretation of mass spectrometry data for high-throughput proteomics , 2003, Analytical and bioanalytical chemistry.