On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics

MOTIVATION Spectral count data generated from label-free tandem mass spectrometry-based proteomic experiments can be used to quantify protein's abundances reliably. Comparing spectral count data from different sample groups such as control and disease is an essential step in statistical analysis for the determination of altered protein level and biomarker discovery. The Fisher's exact test, the G-test, the t-test and the local-pooled-error technique (LPE) are commonly used for differential analysis of spectral count data. However, our initial experiments in two cancer studies show that the current methods are unable to declare at 95% confidence level a number of protein markers that have been judged to be differential on the basis of the biology of the disease and the spectral count numbers. A shortcoming of these tests is that they do not take into account within- and between-sample variations together. Hence, our aim is to improve upon existing techniques by incorporating both the within- and between-sample variations. RESULT We propose to use the beta-binomial distribution to test the significance of differential protein abundances expressed in spectral counts in label-free mass spectrometry-based proteomics. The beta-binomial test naturally normalizes for total sample count. Experimental results show that the beta-binomial test performs favorably in comparison with other methods on several datasets in terms of both true detection rate and false positive rate. In addition, it can be applied for experiments with one or more replicates, and for multiple condition comparisons. Finally, we have implemented a software package for parameter estimation of two beta-binomial models and the associated statistical tests. AVAILABILITY AND IMPLEMENTATION A software package implemented in R is freely available for download at http://www.oncoproteomics.nl/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Martin Crowder,et al.  Beta-binomial Anova for Proportions , 1978 .

[2]  Hyungwon Choi,et al.  Significance Analysis of Spectral Count Data in Label-free Shotgun Proteomics*S , 2008, Molecular & Cellular Proteomics.

[3]  D. Ennis,et al.  THE BETA‐BINOMIAL MODEL: ACCOUNTING FOR INTER‐TRIAL VARIATION IN REPLICATED DIFFERENCE AND PREFERENCE TESTS , 1998 .

[4]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[5]  Benjamin F. Cravatt,et al.  Global Mapping of the Topography and Magnitude of Proteolytic Events in Apoptosis , 2008, Cell.

[6]  D. A. Williams,et al.  The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. , 1975, Biometrics.

[7]  C. Jimenez,et al.  Unravelling the nuclear matrix proteome. , 2009, Journal of proteomics.

[8]  Williams Da,et al.  The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. , 1975 .

[9]  J. G. Skellam A Probability Distribution Derived from the Binomial Distribution by Regarding the Probability of Success as Variable between the Sets of Trials , 1948 .

[10]  Manesh Shah,et al.  Comparative temporal proteomics of a response regulator (SO2426)-deficient strain and wild-type Shewanella oneidensis MR-1 during chromate transformation. , 2009, Journal of proteome research.

[11]  Todd L. VanPool,et al.  Analysis of Frequencies , 2011 .

[12]  Edward M Marcotte,et al.  A map of human protein interactions derived from co-expression of human mRNAs and their orthologs , 2008, Molecular systems biology.

[13]  Jae K. Lee,et al.  Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays , 2003, Bioinform..

[14]  G. Friso,et al.  Large Scale Comparative Proteomics of a Chloroplast Clp Protease Mutant Reveals Folding Stress, Altered Protein Homeostasis, and Feedback Regulation of Metabolism* , 2009, Molecular & Cellular Proteomics.

[15]  Li Deng,et al.  Differential expression in SAGE: accounting for normal between-library variation , 2003, Bioinform..

[16]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[17]  N. Samatova,et al.  Detecting differential and correlated protein expression in label-free shotgun proteomics. , 2006, Journal of proteome research.

[18]  K. Resing,et al.  Comparison of Label-free Methods for Quantifying Human Proteins by Shotgun Proteomics*S , 2005, Molecular & Cellular Proteomics.

[19]  Jens M. Rick,et al.  Quantitative mass spectrometry in proteomics: a critical review , 2007, Analytical and bioanalytical chemistry.