The Pacific Northwest National Laboratory library of bacterial and archaeal proteomic biodiversity

This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.

[1]  Samuel H. Payne,et al.  Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study , 2011, PloS one.

[2]  Michael J MacCoss,et al.  Using BiblioSpec for Creating and Searching Tandem MS Peptide Libraries , 2007, Current protocols in bioinformatics.

[3]  Richard D. Smith,et al.  Does trypsin cut before proline? , 2008, Journal of proteome research.

[4]  Samuel H. Payne,et al.  A proteogenomic update to Yersinia: enhancing genome annotation , 2010, BMC Genomics.

[5]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[6]  Samuel H. Payne,et al.  Accurate annotation of peptide modifications through unrestrictive database search. , 2008, Journal of proteome research.

[7]  Pavel A. Pevzner,et al.  Spectral Archives: Extending Spectral Libraries to Analyze both Identified and Unidentified Spectra , 2011, Nature Methods.

[8]  P. Pevzner,et al.  The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search* , 2010, Molecular & Cellular Proteomics.

[9]  Nikola Tolić,et al.  PRISM: A data management system for high‐throughput proteomics , 2006, Proteomics.

[10]  Vicki H. Wysocki,et al.  Influence of Peptide Composition, Gas-Phase Basicity, and Chemical Modification on Fragmentation Efficiency: Evidence for the Mobile Proton Model , 1996 .

[11]  S. Stein,et al.  Estimating probabilities of correct identification from results of mass spectral library searches , 1994, Journal of the American Society for Mass Spectrometry.

[12]  Charles Ansong,et al.  Top-down proteomics reveals a unique protein S-thiolation switch in Salmonella Typhimurium in response to infection-like conditions , 2013, Proceedings of the National Academy of Sciences.

[13]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[14]  Ivan Mijakovic,et al.  MATERIALS AND METHODS , 1981, Green Corrosion Inhibitors: Reviews and Applications.

[15]  O. Poch,et al.  Ortho-proteogenomics: multiple proteomes investigation through orthology and a new MS-based protocol. , 2008, Genome research.

[16]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[17]  Joshua N. Adkins,et al.  Comparative Bacterial Proteomics: Analysis of the Core Genome Concept , 2008, PloS one.

[18]  Xin Zhang,et al.  Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis , 2011, Proteomics.

[19]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[20]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[21]  Daniel B. Goodman,et al.  Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. , 2008, Genome research.

[22]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[23]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[24]  K. Resing,et al.  Mapping protein post-translational modifications with mass spectrometry , 2007, Nature Methods.

[25]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[26]  Roman A. Zubarev,et al.  Bifurcating fragmentation behavior of gas-phase tryptic peptide dications in collisional activation , 2008, Journal of the American Society for Mass Spectrometry.

[27]  Christopher S. Oehmen,et al.  A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics , 2008, Bioinform..

[28]  Samuel H Payne,et al.  Phosphorylation-specific MS/MS scoring for rapid and accurate phosphoproteome analysis. , 2008, Journal of proteome research.

[29]  Henry H. N. Lam Building and Searching Tandem Mass Spectral Libraries for Peptide Identification* , 2011, Molecular & Cellular Proteomics.