Double-sampling designs to reduce the non-discoveryrate. Application to microarray data

Simultaneous tests of a huge number of hypotheses is a core issue in high flow experimental methods such as microarray for transcriptomic data. In the central debate about the type I error rate, Benjamini and Hochberg (1995) have proposed a procedure that is shown to control the now popular False Discovery Rate (FDR) under assumption of independence between the test statistics. These results have been extended to a larger class of dependency by Benjamini and Yekutieli (2001) and improvements have emerged in recent years, among which step-up procedures have shown desirable properties. The present paper focuses on the type II error rate. The proposed method improves the power by means of double-sampling test statistics integrating external information available both on the sample for which the outcomes are measured and also on additional items. The small sample distribution of the test statistics is provided and simulation studies are used to show the beneficial impact of introducing relevant covariates in the testing strategy. Finally, the present method is implemented in a situation where microarray data are used to select the genes that affect the degree of muscle destructuration in pigs. A phenotypic covariate is introduced in the analysis to improve the search for differentially expressed genes.

[1]  Christopher R. Genovese,et al.  Operating Characteristics and Extensions of the FDR Procedure , 2001 .

[2]  Robert Gentleman,et al.  Differential expression with the Bioconductor Project , 2005 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[5]  Norman E. Breslow,et al.  Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling , 2003 .

[6]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[7]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[8]  D. Conniffe Estimating regression equations with common explanatory variables but unequal numbers of observations , 1985 .

[9]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[10]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[11]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[12]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[13]  A. Owen Variance of the number of false discoveries , 2005 .

[15]  Shuying S Li,et al.  FDR‐controlling testing procedures and sample size determination for microarrays , 2005, Statistics in medicine.