Two-group comparisons of zero-inflated intensity values: the choice of test statistic matters

MOTIVATION A special characteristic of data from molecular biology is the frequent occurrence of zero intensity values which can arise either by true absence of a compound or by a signal that is below a technical limit of detection. RESULTS While so-called two-part tests compare mixture distributions between groups, one-part tests treat the zero-inflated distributions as left-censored. The left-inflated mixture model combines these two approaches. Both types of distributional assumptions and combinations of both are considered in a simulation study to compare power and estimation of log fold change. We discuss issues of application using an example from peptidomics.The considered tests generally perform best in scenarios satisfying their respective distributional assumptions. In the absence of distributional assumptions, the two-part Wilcoxon test or the empirical likelihood ratio test is recommended. Assuming a log-normal subdistribution the left-inflated mixture model provides estimates for the proportions of the two considered types of zero intensities. AVAILABILITY R code is available at http://cemsiis.meduniwien.ac.at/en/kb/science-research/software/

[1]  P A Lachenbruch,et al.  Comparisons of two‐part models with competitors , 2001, Statistics in medicine.

[2]  Alexandros Kalousis,et al.  Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers , 2010, BMC Bioinformatics.

[3]  Dennis R. Helsel,et al.  Statistics for Censored Environmental Data Using Minitab and R , 2012 .

[4]  Karl-Heinz Jöckel,et al.  Two-part permutation tests for DNA methylation and microarray data , 2005, BMC Bioinformatics.

[5]  Kathleen F. Kerr,et al.  Comments on the analysis of unbalanced microarray data , 2009, Bioinform..

[6]  Dennis R. Helsel,et al.  Statistics for Censored Environmental DataUsing Minitab® and R: Helsel/Statistics for Environmental Data 2E , 2011 .

[7]  Cun-Hui Zhang,et al.  Nonparametric methods for measurements below detection limit , 2009, Statistics in medicine.

[8]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[9]  L H Moulton,et al.  A mixture model with detection limits for regression analyses of antibody response to vaccine. , 1995, Biometrics.

[10]  A. Dominiczak,et al.  CE‐MS analysis of the human urinary proteome for biomarker discovery and disease diagnostics , 2008, Proteomics. Clinical applications.

[11]  P. Zürbig,et al.  Human urinary peptide database for multiple disease biomarker discovery , 2011, Proteomics. Clinical applications.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  D. G. Simpson,et al.  Conditional decomposition diagnostics for regression analysis of zero-inflated and left-censored data , 2012, Statistical methods in medical research.

[14]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Taylor Sandra,et al.  Hypothesis tests for point-mass mixture data with application to 'omics data with many zero values. , 2009 .

[16]  A. Hallstrom,et al.  A modified Wilcoxon test for non‐negative distributions with a clump of zeros , 2009, Statistics in medicine.