Supplementary Text: Capturing Heterogeneity in Gene Expression Studies Nested Ks-tests: a Procedure to Test Whether a Procedure Is Valid

The false discovery rate (FDR) has been discussed extensively and it has been pointed out that the distribution of the null p-values must be “correct” or conservative for FDR estimation or any other standard statistical significance measure to behave properly. What is meant for distribution of the null p-values to be correct is that they are Uniformly distributed in the interval (0,1). The null p-values are have a conservative distribution or they are pushed towards 1 relative to the Uniform(0,1). P-values are constructed to have the Uniform distribution property under the null hypothesis, and if this cannot be done exactly the conservative version is calculated [1]. In a simulation study where the right answer is known, there is no off-the-shelf approach to test whether the null p-values have a proper distribution. In this study, we use a Kolmogorov-Smirnov (KS) test on the set of null p-values for deviation from the Uniform. However, we want to test whether this is true over many repeated simulations to avoid “getting lucky” on one particular simulated data set. If the set of null p-values are Uniform, then the p-value resulting from the KS test should also follow the Uniform distribution. Therefore, by examining the KS test p-values over all simulations, we can again apply a KS test to verify that these are Uniformly distributed. Here we have employed this nested KS test to compare the relative behavior of each multiple testing procedure discussed. If the quantiles of the KS test p-values follow the diagonal line in a quantile-quantile plot against the quantiles of the Uniform distribution, then this is very strong evidence that the p-values resulting from the procedure are “correct.”