ProteomicsDB: a multi-omics and multi-organism resource for life science research

Abstract ProteomicsDB (https://www.ProteomicsDB.org) started as a protein-centric in-memory database for the exploration of large collections of quantitative mass spectrometry-based proteomics data. The data types and contents grew over time to include RNA-Seq expression data, drug-target interactions and cell line viability data. In this manuscript, we summarize new developments since the previous update that was published in Nucleic Acids Research in 2017. Over the past two years, we have enriched the data content by additional datasets and extended the platform to support protein turnover data. Another important new addition is that ProteomicsDB now supports the storage and visualization of data collected from other organisms, exemplified by Arabidopsis thaliana. Due to the generic design of ProteomicsDB, all analytical features available for the original human resource seamlessly transfer to other organisms. Furthermore, we introduce a new service in ProteomicsDB which allows users to upload their own expression datasets and analyze them alongside with data stored in ProteomicsDB. Initially, users will be able to make use of this feature in the interactive heat map functionality as well as the drug sensitivity prediction, but ultimately will be able to use all analytical features of ProteomicsDB in this way.

[1]  Mathias Wilhelm,et al.  A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets , 2015, Molecular & Cellular Proteomics.

[2]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[3]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[4]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[5]  Antje Chang,et al.  The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources , 2010, Nucleic Acids Res..

[6]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[7]  Mathias Wilhelm,et al.  Global proteome analysis of the NCI-60 cell line panel. , 2013, Cell reports.

[8]  Joshua A. Bittker,et al.  Correlating chemical sensitivity and basal gene expression reveals mechanism of action , 2015, Nature chemical biology.

[9]  Peer Bork,et al.  20 years of the SMART protein domain annotation resource , 2017, Nucleic Acids Res..

[10]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[11]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[12]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[13]  Heiner Koch,et al.  Chemical Proteomics Uncovers EPHA2 as a Mechanism of Acquired Resistance to Small Molecule EGFR Kinase Inhibition. , 2015, Journal of proteome research.

[14]  B. Kuster,et al.  Peptide Level Turnover Measurements Enable the Study of Proteoform Dynamics * , 2018, Molecular & Cellular Proteomics.

[15]  Mathias Wilhelm,et al.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning , 2019, Nature Methods.

[16]  Mathias Wilhelm,et al.  ProteomeTools: Systematic Characterization of 21 Post-translational Protein Modifications by Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) Using Synthetic Peptides* , 2018, Molecular & Cellular Proteomics.

[17]  Mathias Wilhelm,et al.  Building ProteomeTools based on a complete synthetic human proteome , 2017, Nature Methods.

[18]  Kaixian Chen,et al.  Deep Learning Enhancing Kinome-Wide Polypharmacology Profiling: Model Construction and Experiment Validation. , 2019, Journal of medicinal chemistry.

[19]  Su-In Lee,et al.  The proteomic landscape of triple-negative breast cancer. , 2015, Cell reports.

[20]  John Crowley,et al.  Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat , 2015, BMC Bioinformatics.

[21]  B. Kuster,et al.  Chemoproteomics‐Aided Medicinal Chemistry for the Discovery of EPHA2 Inhibitors , 2017, ChemMedChem.

[22]  Tsippi Iny Stein,et al.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses , 2016, Current protocols in bioinformatics.

[23]  Marco Beccuti,et al.  The molecular landscape of colorectal cancer cell lines unveils clinically actionable kinase targets , 2015, Nature Communications.

[24]  Heiner Koch,et al.  Pharmacoproteomic characterisation of human colon and rectal cancer , 2017, Molecular systems biology.

[25]  Mathias Wilhelm,et al.  PROCAL: A Set of 40 Peptide Standards for Retention Time Indexing, Column Performance Monitoring, and Collision Energy Calibration , 2017, Proteomics.

[26]  M. Monga,et al.  Developmental Therapeutics Program at the NCI: molecular target and drug discovery process , 2002, Leukemia.

[27]  P. Grandi,et al.  Multiplexed Proteome Dynamics Profiling Reveals Mechanisms Controlling Protein Homeostasis , 2018, Cell.

[28]  Julio Saez-Rodriguez,et al.  OmniPath: guidelines and gateway for literature-curated signaling pathway resources , 2016, Nature Methods.

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  Andrea Komljenovic,et al.  BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests , 2016, F1000Research.

[31]  Helmut Krcmar,et al.  ProteomicsDB , 2017, Nucleic Acids Res..

[32]  B. Kuster,et al.  Chemical Proteomics and Structural Biology Define EPHA2 Inhibition by Clinical Kinase Drugs. , 2016, ACS chemical biology.

[33]  Emanuel J. V. Gonçalves,et al.  A Landscape of Pharmacogenomic Interactions in Cancer , 2016, Cell.

[34]  Mike Tyers,et al.  Gene Information eXtension (GIX): effortless retrieval of gene product information on any website , 2019, Nature Methods.

[35]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[36]  G. Drewes,et al.  Tracking cancer drugs in living cells by thermal profiling of the proteome , 2014, Science.

[37]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[38]  Barry Smith,et al.  The Plant Ontology Facilitates Comparisons of Plant Development Stages Across Species , 2019, Front. Plant Sci..

[39]  Heiner Koch,et al.  The target landscape of clinical kinase drugs , 2017, Science.