Computational protein profile similarity screening for quantitative mass spectrometry experiments

MOTIVATION The qualitative and quantitative characterization of protein abundance profiles over a series of time points or a set of environmental conditions is becoming increasingly important. Using isobaric mass tagging experiments, mass spectrometry-based quantitative proteomics deliver accurate peptide abundance profiles for relative quantitation. Associated data analysis workflows need to provide tailored statistical treatment that (i) takes the correlation structure of the normalized peptide abundance profiles into account and (ii) allows inference of protein-level similarity. We introduce a suitable distance measure for relative abundance profiles, derive a statistical test for equality and propose a protein-level representation of peptide-level measurements. This yields a workflow that delivers a similarity ranking of protein abundance profiles with respect to a defined reference. All procedures have in common that they operate based on the true correlation structure that underlies the measurements. This optimizes power and delivers more intuitive and efficient results than existing methods that do not take these circumstances into account. RESULTS We use protein profile similarity screening to identify candidate proteins whose abundances are post-transcriptionally controlled by the Anaphase Promoting Complex/Cyclosome (APC/C), a specific E3 ubiquitin ligase that is a master regulator of the cell cycle. Results are compared with an established protein correlation profiling method. The proposed procedure yields a 50.9-fold enrichment of co-regulated protein candidates and a 2.5-fold improvement over the previous method. AVAILABILITY A MATLAB toolbox is available from http://hci.iwr.uni-heidelberg.de/mip/proteomics.

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  M. C. Jones,et al.  The Statistical Analysis of Compositional Data , 1986 .

[3]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[4]  J. Aitchison Principal component analysis of compositional data , 1983 .

[5]  G. Ronning Maximum likelihood estimation of dirichlet distributions , 1989 .

[6]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[7]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[8]  John Aitchison,et al.  Principles of compositional data analysis , 1994 .

[9]  M. Kirschner,et al.  A 20s complex containing CDC27 and CDC16 catalyzes the mitosis-specific conjugation of ubiquitin to cyclin B , 1995, Cell.

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  G. Wahl,et al.  PRC1: a human mitotic spindle-associated CDK substrate protein required for cytokinesis. , 1998, Molecular cell.

[12]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[13]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[14]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[15]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  B. Séraphin,et al.  The tandem affinity purification (TAP) method: a general procedure of protein complex purification. , 2001, Methods.

[17]  T. Hunter,et al.  PRC1 is a microtubule binding and bundling protein essential to maintain the mitotic spindle midzone , 2002, The Journal of cell biology.

[18]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[19]  M. Mann,et al.  Proteomic characterization of the human centrosome by protein correlation profiling , 2003, Nature.

[20]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[21]  K. Parker,et al.  Depth of Proteome Issues , 2004, Molecular & Cellular Proteomics.

[22]  M. Mann,et al.  Mass spectrometry–based proteomics turns quantitative , 2005, Nature chemical biology.

[23]  A. Bauch,et al.  An efficient tandem affinity purification procedure for interaction proteomics in mammalian cells , 2006, Nature Methods.

[24]  J. Peters The anaphase promoting complex/cyclosome: a machine designed to destroy , 2006, Nature Reviews Molecular Cell Biology.

[25]  Xiaohui S. Xie,et al.  A Mammalian Organelle Map by Protein Correlation Profiling , 2006, Cell.

[26]  M. Mann,et al.  Protein interaction screening by quantitative immunoprecipitation combined with knockdown (QUICK) , 2006, Nature Methods.

[27]  J. L. Palma,et al.  Deciphering magma mixing: The application of cluster analysis to the mineral chemistry of crystal populations , 2007 .

[28]  C. Turck,et al.  The Association of Biomolecular Resource Facilities Proteomics Research Group 2006 Study , 2007, Molecular & Cellular Proteomics.

[29]  Jens M. Rick,et al.  Quantitative mass spectrometry in proteomics: a critical review , 2007, Analytical and bioanalytical chemistry.

[30]  T. Therneau,et al.  A statistical model for iTRAQ data analysis. , 2008, Journal of proteome research.

[31]  Terry M Therneau,et al.  Statistical analysis of relative labeled mass spectrometry data from complex samples using ANOVA. , 2008, Journal of proteome research.

[32]  Xiaomin Song,et al.  iTRAQ experimental design for plasma biomarker discovery. , 2008, Journal of proteome research.

[33]  F. White,et al.  Illuminating signaling network functional biology through quantitative phosphoproteomic mass spectrometry. , 2008, Briefings in functional genomics & proteomics.

[34]  Olivier Poch,et al.  A maximum likelihood approximation method for Dirichlet's parameter estimation , 2008, Comput. Stat. Data Anal..

[35]  F. White Quantitative phosphoproteomic analysis of signaling network dynamics. , 2008, Current opinion in biotechnology.