Visualization of proteomics data using R and Bioconductor

Data visualization plays a key role in high‐throughput biology. It is an essential tool for data exploration allowing to shed light on data structure and patterns of interest. Visualization is also of paramount importance as a form of communicating data to a broad audience. Here, we provided a short overview of the application of the R software to the visualization of proteomics data. We present a summary of R's plotting systems and how they are used to visualize and understand raw and processed MS‐based proteomics data.

[1]  W. Tong,et al.  Quantitative Proteomics Reveals the Temperature-Dependent Proteins Encoded by a Series of Cluster Genes in Thermoanaerobacter Tengcongensis* , 2013, Molecular & Cellular Proteomics.

[2]  Olga Vitek,et al.  Cardinal: an R package for statistical analysis of mass spectrometry-based imaging experiments , 2015, Bioinform..

[3]  Wolfgang Huber,et al.  Mapping of signaling networks through synthetic genetic interaction analysis by RNAi , 2011, Nature Methods.

[4]  Christian Panse,et al.  protViz: Visualizing and Analyzing Mass Spectrometry Related Data in Proteomics , 2014 .

[5]  Laurent Gatto,et al.  Using R and Bioconductor for proteomics data analysis. , 2013, Biochimica et biophysica acta.

[6]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[7]  L. Gatto,et al.  Identification of Trans-Golgi Network Proteins in Arabidopsis thaliana Root Tissue , 2013, Journal of proteome research.

[8]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[9]  Peter J. Woolf,et al.  GAGE: generally applicable gene set enrichment for pathway analysis , 2009, BMC Bioinformatics.

[10]  Sean R. Davis,et al.  SRAdb: query and use public next-generation sequencing data from within R , 2013, BMC Bioinformatics.

[11]  Markus Müller,et al.  Isoelectric point optimization using peptide descriptors and support vector machines. , 2012, Journal of proteomics.

[12]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[13]  Piotr Dittwald,et al.  An Efficient Method to Calculate the Aggregated Isotopic Distribution and Exact Center-Masses , 2012, Journal of The American Society for Mass Spectrometry.

[14]  T. Lumley,et al.  gplots: Various R Programming Tools for Plotting Data , 2015 .

[15]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[16]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[17]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[18]  M. Mann,et al.  System-Wide Temporal Characterization of the Proteome and Phosphoproteome of Human Embryonic Stem Cell Differentiation , 2011, Science Signaling.

[19]  Audrey Kauffmann,et al.  Importing ArrayExpress datasets into R/Bioconductor , 2009, Bioinform..

[20]  Martin Eisenacher,et al.  PAA: an R/bioconductor package for biomarker discovery with protein microarrays , 2016, Bioinform..

[21]  Leland Wilkinson,et al.  The Grammar of Graphics (Statistics and Computing) , 2005 .

[22]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[23]  Roger D Peng,et al.  Reproducible research and Biostatistics. , 2009, Biostatistics.

[24]  Thomas Burger,et al.  Mass-spectrometry-based spatial proteomics data analysis using pRoloc and pRolocdata , 2014, Bioinform..

[25]  Gatto Laurent A current perspective on using R and Bioconductor for proteomics data analysis , 2014 .

[26]  Xiaohui S. Xie,et al.  A Mammalian Organelle Map by Protein Correlation Profiling , 2006, Cell.

[27]  M. Trotter,et al.  Improved sub‐cellular resolution via simultaneous analysis of organelle proteomics data across varied experimental conditions , 2010, Proteomics.

[28]  C. de Duve,et al.  A short history of tissue fractionation , 1981, The Journal of cell biology.

[29]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[30]  Erik K. Malm,et al.  A Human Protein Atlas for Normal and Cancer Tissues Based on Antibody Proteomics* , 2005, Molecular & Cellular Proteomics.

[31]  Karl Mechtler,et al.  General statistical modeling of data from protein relative expression isobaric tags. , 2011, Journal of proteome research.

[32]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[33]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[34]  D. Cook,et al.  ggbio: an R package for extending the grammar of graphics for genomic data , 2012, Genome Biology.

[35]  Brendan MacLean,et al.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments , 2014, Bioinform..

[36]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[37]  Rod B. Watson,et al.  Mapping the Arabidopsis organelle proteome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[38]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[39]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[40]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[41]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[42]  Sebastian Gibb,et al.  MALDIquant: a versatile R package for the analysis of mass spectrometry data , 2012, Bioinform..

[43]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[44]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[45]  M. Trotter,et al.  The effect of organelle discovery upon sub-cellular protein localisation. , 2013, Journal of proteomics.

[46]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[47]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[48]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[49]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[50]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[51]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[52]  Christopher Rao,et al.  Graphs in Statistical Analysis , 2010 .

[53]  Lennart Martens,et al.  A posteriori quality control for the curation and reuse of public proteomics data , 2011, Proteomics.

[54]  G. Superti-Furga,et al.  Building and exploring an integrated human kinase network: Global organization and medical entry points☆ , 2014, Journal of proteomics.

[55]  Kathryn S Lilley,et al.  Mapping organelle proteins and protein complexes in Drosophila melanogaster. , 2009, Journal of proteome research.

[56]  Yihui Xie,et al.  animation: An R Package for Creating Animations and Demonstrating Statistical Methods , 2013 .

[57]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[58]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[59]  Robert Gentleman,et al.  Reproducible Research: A Bioinformatics Case Study , 2005, Statistical applications in genetics and molecular biology.

[60]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[61]  David L Donoho,et al.  An invitation to reproducible computational research. , 2010, Biostatistics.

[62]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[63]  Kathryn S. Lilley,et al.  MSnbase-an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation , 2012, Bioinform..

[64]  Laurent Gatto,et al.  Label-Free Protein Quantification for Plant Golgi Protein Localization and Abundance1[W] , 2014, Plant Physiology.

[65]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[66]  Pan Du,et al.  Bioinformatics Original Paper Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching , 2022 .