A Foundation for Reliable Spatial Proteomics Data Analysis*

Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis.

[1]  Juyong Park,et al.  Protein localization as a principal feature of the etiology and comorbidity of genetic diseases , 2011, Molecular systems biology.

[2]  Xiaohui S. Xie,et al.  A Mammalian Organelle Map by Protein Correlation Profiling , 2006, Cell.

[3]  Damian C Crowther,et al.  Protein misfolding and disease: from the test tube to the organism. , 2008, Current opinion in chemical biology.

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  S. Munro,et al.  Putative Glycosyltransferases and Other Plant Golgi Apparatus Proteins Are Revealed by LOPIT Proteomics1[W] , 2012, Plant Physiology.

[6]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[9]  Juri Rappsilber,et al.  The Protein Composition of Mitotic Chromosomes Determined Using Multiclassifier Combinatorial Proteomics , 2010, Cell.

[10]  J. Garin,et al.  AT_CHLORO, a Comprehensive Chloroplast Proteome Database with Subplastidial Localization and Curated Information on Envelope Proteins* , 2010, Molecular & Cellular Proteomics.

[11]  Pamela A. Silver,et al.  Nuclear transport and cancer: from mechanism to intervention , 2004, Nature Reviews Cancer.

[12]  M. Trotter,et al.  Improved sub‐cellular resolution via simultaneous analysis of organelle proteomics data across varied experimental conditions , 2010, Proteomics.

[13]  Kathryn S. Lilley,et al.  MSnbase-an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation , 2012, Bioinform..

[14]  Matthias Mann,et al.  The mitochondrial contact site complex, a determinant of mitochondrial architecture , 2011, The EMBO journal.

[15]  Juan Antonio Vizcaíno,et al.  Organelle proteomics experimental designs and analysis , 2010, Proteomics.

[16]  M. Trotter,et al.  The effect of organelle discovery upon sub-cellular protein localisation. , 2013, Journal of proteomics.

[17]  Kathryn S Lilley,et al.  The Organelle Proteome of the DT40 Lymphocyte Cell Line* , 2009, Molecular & Cellular Proteomics.

[18]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[19]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[20]  Erik K. Malm,et al.  A Human Protein Atlas for Normal and Cancer Tissues Based on Antibody Proteomics* , 2005, Molecular & Cellular Proteomics.

[21]  Fabio Fiorani,et al.  Experimental designs and analysis , 2015 .

[22]  C. de Duve,et al.  A short history of tissue fractionation , 1981, The Journal of cell biology.

[23]  Thomas Burger,et al.  Mass-spectrometry-based spatial proteomics data analysis using pRoloc and pRolocdata , 2014, Bioinform..

[24]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[25]  Ruedi Aebersold Editorial: From Data to Results , 2011, Molecular & Cellular Proteomics.

[26]  Rod B. Watson,et al.  Mapping the Arabidopsis organelle proteome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[27]  R. Aebersold,et al.  Quantitative proteomic analysis to profile dynamic changes in the spatial distribution of cellular proteins. , 2008, Methods in molecular biology.

[28]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[29]  Michael Hippler,et al.  PredAlgo: a new subcellular localization prediction tool dedicated to green algae. , 2012, Molecular biology and evolution.

[30]  Mathias Dreger,et al.  Subcellular proteomics , 2021, Nature Reviews Methods Primers.

[31]  M. Mann,et al.  Proteomic characterization of the human centrosome by protein correlation profiling , 2003, Nature.

[32]  Emma Lundberg,et al.  Novel asymmetrically localizing components of human centrosomes identified by complementary proteomics methods , 2011, The EMBO journal.

[33]  François-Michel Boisvert,et al.  Proteomics methods for subcellular proteome analysis , 2013, The FEBS journal.

[34]  Thomas Burger,et al.  PerTurbo: A New Classification Algorithm Based on the Spectrum Perturbations of the Laplace-Beltrami Operator , 2011, ECML/PKDD.

[35]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[36]  M. Vihinen,et al.  Prediction of disease-related mutations affecting protein localization , 2009, BMC Genomics.

[37]  Kathryn S Lilley,et al.  Mapping organelle proteins and protein complexes in Drosophila melanogaster. , 2009, Journal of proteome research.

[38]  N. Karp,et al.  Addressing Accuracy and Precision Issues in iTRAQ Quantitation* , 2010, Molecular & Cellular Proteomics.

[39]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .