Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell‐lines

Protein quantification at proteome‐wide scale is an important aim, enabling insights into fundamental cellular biology and serving to constrain experiments and theoretical models. While proteome‐wide quantification is not yet fully routine, many datasets approaching proteome‐wide coverage are becoming available through biophysical and MS techniques. Data of this type can be accessed via a variety of sources, including publication supplements and online data repositories. However, access to the data is still fragmentary, and comparisons across experiments and organisms are not straightforward. Here, we describe recent updates to our database resource “PaxDb” (Protein Abundances Across Organisms). PaxDb focuses on protein abundance information at proteome‐wide scope, irrespective of the underlying measurement technique. Quantification data is reprocessed, unified, and quality‐scored, and then integrated to build a meta‐resource. PaxDb also allows evolutionary comparisons through precomputed gene orthology relations. Recently, we have expanded the scope of the database to include cell‐line samples, and more systematically scan the literature for suitable datasets. We report that a significant fraction of published experiments cannot readily be accessed and/or parsed for quantitative information, requiring additional steps and efforts. The current update brings PaxDb to 414 datasets in 53 organisms, with (semi‐) quantitative abundance information covering more than 300 000 proteins.

[1]  R. Zubarev The challenge of the proteome dynamic range and its implications for in‐depth proteomics , 2013, Proteomics.

[2]  Yasset Perez-Riverol,et al.  Making proteomics data accessible and reusable: Current state of proteomics databases and repositories , 2015, Proteomics.

[3]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[4]  Isabelle Gagnon-Arsenault,et al.  Transcriptional divergence plays a role in the rewiring of protein interaction networks after gene duplication. , 2013, Journal of proteomics.

[5]  A. Heck,et al.  Next-generation proteomics: towards an integrative view of proteome dynamics , 2012, Nature Reviews Genetics.

[6]  Lloyd M. Smith,et al.  Proteoform: a single term describing protein complexity , 2013, Nature Methods.

[7]  Eugene Kolker,et al.  MOPED 2.5--an integrated multi-omics resource: multi-omics profiling expression database now includes transcriptomics data. , 2014, Omics : a journal of integrative biology.

[8]  R. Aebersold,et al.  Comparative Functional Analysis of the Caenorhabditis elegans and Drosophila melanogaster Proteomes , 2009, PLoS biology.

[9]  M. Tress,et al.  Analyzing the First Drafts of the Human Proteome , 2014, Journal of proteome research.

[10]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[11]  StanberryLarissa,et al.  MOPED 2.5—An Integrated Multi-Omics Resource: Multi-Omics Profiling Expression Database Now Includes Transcriptomics Data , 2014 .

[12]  Eric W Deutsch,et al.  Using PeptideAtlas, SRMAtlas, and PASSEL: Comprehensive Resources for Discovery and Targeted Proteomics , 2014, Current protocols in bioinformatics.

[13]  M. Mann,et al.  The coming age of complete, accurate, and ubiquitous proteomes. , 2013, Molecular cell.

[14]  A. Bairoch,et al.  neXtProt: organizing protein knowledge in the context of human proteome projects. , 2013, Journal of proteome research.

[15]  Matthias Mann,et al.  Analysis of High Accuracy, Quantitative Proteomics Data in the MaxQB Database , 2012, Molecular & Cellular Proteomics.

[16]  A. Horovitz,et al.  Different subunits belonging to the same protein complex often exhibit discordant expression levels and evolutionary properties. , 2014, Current opinion in structural biology.

[17]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[18]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[19]  D. Lancet,et al.  Widespread ectopic expression of olfactory receptor genes , 2006, BMC Genomics.

[20]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[21]  Damian Szklarczyk,et al.  eggNOG v4.0: nested orthology inference across 3686 organisms , 2013, Nucleic Acids Res..

[22]  Harkamal Walia,et al.  Protein abundances are more conserved than mRNA abundances across diverse taxa , 2010, Proteomics.

[23]  Christian von Mering,et al.  Shotgun proteomics data from multiple organisms reveals remarkable quantitative conservation of the eukaryotic core proteome , 2010, Proteomics.

[24]  M. Shub,et al.  Amino Acid Metabolism Conflicts with Protein Diversity , 2014, Molecular biology and evolution.

[25]  M. Schuldiner,et al.  The emergence of proteome-wide technologies: systematic analysis of proteins comes of age , 2014, Nature Reviews Molecular Cell Biology.

[26]  Lennart Martens,et al.  Bioinformatics challenges in mass spectrometry-driven proteomics. , 2011, Methods in molecular biology.

[27]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[28]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[29]  C. von Mering,et al.  PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life , 2012, Molecular & Cellular Proteomics.

[30]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..