Proteomics data repositories: Providing a safe haven for your data and acting as a springboard for further research

Despite the fact that data deposition is not a generalised fact yet in the field of proteomics, several mass spectrometry (MS) based proteomics repositories are publicly available for the scientific community. The main existing resources are: the Global Proteome Machine Database (GPMDB), PeptideAtlas, the PRoteomics IDEntifications database (PRIDE), Tranche, and NCBI Peptidome. In this review the capabilities of each of these will be described, paying special attention to four key properties: data types stored, applicable data submission strategies, supported formats, and available data mining and visualization tools. Additionally, the data contents from model organisms will be enumerated for each resource. There are other valuable smaller and/or more specialized repositories but they will not be covered in this review. Finally, the concept behind the ProteomeXchange consortium, a collaborative effort among the main resources in the field, will be introduced.

[1]  C. Bessant,et al.  GAPP: a fully automated software for the confident identification of human peptides from tandem mass spectra. , 2006, Journal of proteome research.

[2]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[3]  Junjun Zhang,et al.  BioMart Central Portal—unified access to biological data , 2009, Nucleic Acids Res..

[4]  M. Mann,et al.  Is Proteomics the New Genomics? , 2007, Cell.

[5]  Gautier Koscielny,et al.  Ensembl’s 10th year , 2009, Nucleic Acids Res..

[6]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[7]  Rong Wang,et al.  The need for a public proteomics repository , 2004, Nature Biotechnology.

[8]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[9]  Dennis B. Troup,et al.  NCBI Peptidome: a new repository for mass spectrometry proteomics data , 2009, Nucleic Acids Res..

[10]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[11]  Henning Hermjakob,et al.  Submit Your Interaction Data the IMEx Way , 2007, Proteomics.

[12]  Lennart Martens,et al.  The Ontology Lookup Service: more data and better tools for controlled vocabulary queries , 2008, Nucleic Acids Res..

[13]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[14]  Lennart Martens,et al.  PRIDE Converter: making proteomics data-sharing easy , 2009, Nature Biotechnology.

[15]  M. Schwaiger,et al.  Future perspectives and conclusions , 2002 .

[16]  Lennart Martens,et al.  jmzML, an open‐source Java API for mzML, the PSI standard for MS data , 2010, Proteomics.

[17]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[18]  Erik K. Malm,et al.  A Human Protein Atlas for Normal and Cancer Tissues Based on Antibody Proteomics* , 2005, Molecular & Cellular Proteomics.

[19]  Michael Riffle,et al.  Proteomics data repositories , 2009, Proteomics.

[20]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[21]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[22]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[23]  Lars Malmström,et al.  The Yeast Resource Center Public Data Repository , 2004, Nucleic Acids Res..

[24]  P. Kischel,et al.  Identification of accessible human cancer biomarkers using ex vivo chemical proteomic strategies , 2007, Expert review of proteomics.

[25]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[26]  Henry H. N. Lam,et al.  A database of mass spectrometric assays for the yeast proteome , 2008, Nature Methods.

[27]  Lennart Martens,et al.  The Proteomics Identifications database: 2010 update , 2009, Nucleic Acids Res..

[28]  Hujun Yin,et al.  PepSeeker: a database of proteome peptide identifications for investigating fragmentation patterns , 2005, Nucleic Acids Res..

[29]  Richard Côté,et al.  The PRIDE proteomics identifications database: data submission, query, and dataset comparison. , 2008, Methods in molecular biology.

[30]  Lennart Martens,et al.  Using the Proteomics Identifications Database (PRIDE) , 2008, Current protocols in bioinformatics.

[31]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[32]  Sameer Velankar,et al.  E-MSD: improving data deposition and structure quality , 2005, Nucleic Acids Res..

[33]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[34]  Ruedi Aebersold,et al.  Mass spectrometry based targeted protein quantification: methods and applications. , 2009, Journal of proteome research.

[35]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[36]  Lennart Martens,et al.  Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories , 2005, Proteomics.

[37]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[38]  Hideaki Sugawara,et al.  Archiving next generation sequencing data , 2009, Nucleic Acids Res..

[39]  Lukas N. Mueller,et al.  An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. , 2008, Journal of proteome research.

[40]  Sue Povey,et al.  The HGNC Database in 2008: a resource for the human genome , 2007, Nucleic Acids Res..

[41]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[42]  Ron Edgar,et al.  NCBI Peptidome: a new public repository for mass spectrometry peptide identifications , 2009, Nature Biotechnology.

[43]  Lennart Martens,et al.  Peptide and protein quantification: A map of the minefield , 2010, Proteomics.

[44]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[45]  Nichole L. King,et al.  Targeted Quantitative Analysis of Streptococcus pyogenes Virulence Factors by Multiple Reaction Monitoring*S , 2008, Molecular & Cellular Proteomics.

[46]  Lennart Martens,et al.  Analyzing large-scale proteomics projects with latent semantic indexing. , 2008, Journal of proteome research.

[47]  Lennart Martens,et al.  A guide to the Proteomics Identifications Database proteomics data repository , 2009, Proteomics.

[48]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[49]  D. Creasy,et al.  Unimod: Protein modifications for mass spectrometry , 2004, Proteomics.

[50]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[51]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[52]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[53]  Lennart Martens,et al.  The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases , 2007, BMC Bioinformatics.

[54]  Y. L. Ramachandra,et al.  Human Proteinpedia enables sharing of human protein data , 2008, Nature Biotechnology.

[55]  Hagen Blankenburg,et al.  Integrating biological data – the Distributed Annotation System , 2008, BMC Bioinformatics.

[56]  Elise C. Kohn,et al.  Proteomics as a Tool for Biomarker Discovery , 2007, Disease markers.

[57]  Lennart Martens,et al.  ms_lims, a simple yet powerful open source laboratory information management system for MS‐driven proteomics , 2010, Proteomics.

[58]  Rolf Apweiler,et al.  The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible , 2006, Expert review of proteomics.

[59]  Christoph Steinbeck,et al.  Chemical Entities of Biological Interest: an update , 2009, Nucleic Acids Res..

[60]  Jae K. Lee,et al.  Transcript and protein expression profiles of the NCI-60 cancer cell panel: an integromic microarray study , 2007, Molecular Cancer Therapeutics.

[61]  E. Marcotte,et al.  Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation , 2007, Nature Biotechnology.

[62]  Florian Gnad,et al.  MAPU 2.0: high-accuracy proteomes mapped to genomes , 2009, Nucleic Acids Res..

[63]  Ruedi Aebersold,et al.  Building consensus spectral libraries for peptide identification in proteomics , 2008, Nature Methods.

[64]  John S Garavelli,et al.  The RESID Database of Protein Modifications as a resource and annotation tool , 2004, Proteomics.

[65]  Sameer Velankar,et al.  PDBe: Protein Data Bank in Europe , 2010, Nucleic Acids Res..

[66]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[67]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[68]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[69]  Lennart Martens,et al.  Analysis of the experimental detection of central nervous system‐related genes in human brain and cerebrospinal fluid datasets , 2008, Proteomics.

[70]  Lennart Martens,et al.  Proteomics data validation: why all must provide data. , 2007, Molecular bioSystems.

[71]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[72]  Jennifer A Mead,et al.  Recent developments in public proteomic MS repositories and pipelines , 2009, Proteomics.

[73]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[74]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[75]  Luis Mendoza,et al.  MaRiMba: a software application for spectral library-based MRM transition list assembly. , 2009, Journal of proteome research.

[76]  Robert E. Kearney,et al.  A HUPO test sample study reveals common problems in mass spectrometry-based proteomics , 2009, Nature Methods.

[77]  John D. Venable,et al.  MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. , 2004, Rapid communications in mass spectrometry : RCM.