Weighted False Discovery Rate Control in Large-Scale Multiple Testing

ABSTRACT The use of weights provides an effective strategy to incorporate prior domain knowledge in large-scale inference. This article studies weighted multiple testing in a decision-theoretical framework. We develop oracle and data-driven procedures that aim to maximize the expected number of true positives subject to a constraint on the weighted false discovery rate. The asymptotic validity and optimality of the proposed methods are established. The results demonstrate that incorporating informative domain knowledge enhances the interpretability of results and precision of inference. Simulation studies show that the proposed method controls the error rate at the nominal level, and the gain in power over existing methods is substantial in many settings. An application to a genome-wide association study is discussed. Supplementary materials for this article are available online.

[1]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Anders M. Dale,et al.  Covariate-modulated local false discovery rate for genome-wide association studies , 2014, Bioinform..

[4]  Wenguang Sun,et al.  CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference , 2018 .

[5]  John D. Storey The optimal discovery procedure: a new approach to simultaneous significance testing , 2007 .

[6]  Yoav Benjamini,et al.  Selective inference in complex research , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[7]  Larry Wasserman,et al.  Genome-Wide Significance Levels and Weighted Hypothesis Testing. , 2009, Statistical science : a review journal of the Institute of Mathematical Statistics.

[8]  B. Efron SIMULTANEOUS INFERENCE : WHEN SHOULD HYPOTHESIS TESTING PROBLEMS BE COMBINED? , 2008, 0803.3863.

[9]  Li He,et al.  Capturing the severity of type II errors in high-dimensional multiple testing , 2014, J. Multivar. Anal..

[10]  K. Taylor,et al.  Genome-Wide Association , 2007, Diabetes.

[11]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[12]  Kiranmoy Das,et al.  Genome-Wide Association Studies for Bivariate Sparse Longitudinal Data , 2011, Human Heredity.

[13]  Harrison H. Zhou,et al.  False Discovery Rate Control With Groups , 2010, Journal of the American Statistical Association.

[14]  Wenguang Sun,et al.  False discovery control in large‐scale spatial multiple testing , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[15]  Wenguang Sun,et al.  Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks , 2009 .

[16]  T. Tony Cai,et al.  Optimal screening and discovery of sparse signals with applications to multistage high throughput studies , 2017 .

[17]  Eric Boerwinkle,et al.  A weighted false discovery rate control procedure reveals alleles at FOXA2 that influence fasting glucose levels. , 2010, American journal of human genetics.

[18]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[19]  John D. Storey A direct approach to false discovery rates , 2002 .

[20]  Larry Wasserman,et al.  Using linkage genome scans to improve power of association in genome scans. , 2006, American journal of human genetics.

[21]  Jiashun Jin,et al.  Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing , 2010, 1001.1609.

[22]  Rami Cohen,et al.  Weighted false discovery rate controlling procedures for clinical trials , 2016, Biostatistics.

[23]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[24]  Guifang Fu,et al.  The Bayesian lasso for genome-wide association studies , 2011, Bioinform..

[25]  Wan-Yu Lin,et al.  Improving Power of Genome-Wide Association Studies with Weighted False Discovery Rate Control and Prioritized Subset Analysis , 2012, PloS one.

[26]  Aldo Solari,et al.  Comment on Cai, Sun and Wang “Covariate-assisted ranking and screening for large-scale two-sample inference” , 2019 .

[27]  Y. Benjamini,et al.  Multiple Hypotheses Testing with Weights , 1997 .

[28]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[29]  Edsel A. Peña,et al.  POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES. , 2009, Annals of statistics.

[30]  I. Verdinelli,et al.  False Discovery Control for Random Fields , 2004 .

[31]  C. Jaquish,et al.  The Framingham Heart Study, on its way to becoming the gold standard for Cardiovascular Genetic Epidemiology? , 2007, BMC Medical Genetics.

[32]  P. Westfall,et al.  Gatekeeping strategies for clinical trials that do not require all primary effects to be significant , 2003, Statistics in medicine.

[33]  R. Vasan,et al.  Bmc Medical Genetics Genome-wide Association to Body Mass Index and Waist Circumference: the Framingham Heart Study 100k Project , 2022 .

[34]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[35]  R. Elston,et al.  Enhancing the Power to Detect Low-Frequency Variants in Genome-Wide Screens , 2014, Genetics.

[36]  P. Billingsley,et al.  Probability and Measure , 1980 .

[37]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[38]  Haavard Rue,et al.  Unsupervised empirical Bayesian multiple testing with external covariates , 2008, 0807.4658.

[39]  E. Spjøtvoll On the Optimality of Some Multiple Comparison Procedures , 1972 .

[40]  Y. Benjamini,et al.  False Discovery Rates for Spatial Signals , 2007 .

[41]  D. Rogosa Student Progress in California Charter Schools, 1999-2002 , 2003 .

[42]  S. Sarkar,et al.  A new approach to multiple testing of grouped hypotheses , 2016 .

[43]  Runze Li,et al.  A dynamic model for genome-wide association studies , 2011, Human Genetics.

[44]  Étienne Roquain,et al.  Optimal weighting for false discovery rate control , 2008, 0807.4081.

[45]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[46]  G. Box,et al.  Problems in the analysis of growth and wear curves. , 1950, Biometrics.

[47]  C. Gries,et al.  Exploration of the negative correlation between proliferative hepatocellular lesions and lymphoma in rats and mice--establishment and implications. , 1984, Fundamental and applied toxicology : official journal of the Society of Toxicology.

[48]  L. Liang,et al.  Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma , 2013, Front. Genet..