Biological experiments often involve hypothesis testing at the scale of thousands to millions of tests. Alleviating the multiple testing burden has been a goal of many methods designed to boost test power by focusing tests on the alternative hypotheses most likely to be true. Very often, these methods either explicitly or implicitly make use of prior probabilities that bias significance for favored sets thought to be enriched for significant finding. Nevertheless, most genomics experiments, and in particular genome-wide association studies (GWAS), still use traditional univariate tests rather than more sophisticated approaches. Here we use GWAS to demonstrate why unbiased tests remain in favor. We calculate test power assuming perfect knowledge of a prior distribution and then derive the population size increase required to provided the same boost without a prior. We show that population size is exponentially more important than prior, providing a rigorous explanation for the observed avoidance of prior-based methods. Author summary Biological experiments often test thousands to millions of hypotheses. Gene-based tests for human RNA-Seq data, for example, involve approximately 20,000; genome-wide association studies (GWAS) involve about 1 million effective tests. The conventional approach is to perform individual tests and then apply a Bonferroni correction to account for multiple testing. This approach implies a single-test p-value of 2.5 × 10−6 for RNA-Seq experiments, and a p-value of 5 × 10−8 for GWAS, to control the false-positive rate at a conventional value of 0.05. Many methods have been proposed to alleviate the multiple-testing burden by incorporating a prior probability that boosts the significance for a subset of candidate genes or variants. At the extreme limit, only the candidate set is tested, corresponding to a decreased multiple testing burden. Despite decades of methods development, prior-based tests have not been generally used. Here we compare the power increase possible with a prior with the increase possible with a much simpler strategy of increasing a study size. We show that increasing the population size is exponentially more valuable than increasing the strength of prior, even when the true prior is known exactly. These results provide a rigorous explanation for the continued use of simple, robust methods rather than more sophisticated approaches.
[1]
Joseph K. Pickrell.
Joint analysis of functional genomic data and genome-wide association studies of 18 human traits
,
2013,
bioRxiv.
[2]
O. Andreassen,et al.
All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs
,
2013,
PLoS genetics.
[3]
Nathan L Tintle,et al.
Incorporating prior knowledge to increase the power of genome-wide association studies.
,
2013,
Methods in molecular biology.
[4]
F. Collins,et al.
Potential etiologic and functional implications of genome-wide association loci for human diseases and traits
,
2009,
Proceedings of the National Academy of Sciences.
[5]
J. Bader,et al.
Fast Association Tests for Genes with FAST
,
2013,
PloS one.
[6]
Oliver Stegle,et al.
LiMMBo: a simple, scalable approach for linear mixed models in high-dimensional genetic association studies
,
2018,
bioRxiv.
[7]
Hailiang Huang,et al.
Gene-Based Tests of Association
,
2011,
PLoS genetics.
[8]
Anders Albrechtsen,et al.
Weighting sequence variants based on their annotation increases power of whole-genome association studies
,
2016,
Nature Genetics.
[9]
Toshiko Tanaka,et al.
Discovering patterns of pleiotropy in genome-wide association studies
,
2018,
bioRxiv.
[10]
S. S. Wilks.
The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses
,
1938
.