Double Empirical Bayes Testing

Analysing data from large-scale, multiexperiment studies requires scientists to both analyse each experiment and to assess the results as a whole. In this article, we develop double empirical Bayes testing (DEBT), an empirical Bayes method for analysing multiexperiment studies when many covariates are gathered per experiment. DEBT is a two-stage method: in the first stage, it reports which experiments yielded significant outcomes and in the second stage, it hypothesises which covariates drive the experimental significance. In both of its stages, DEBT builds on the work of Efron, who laid out an elegant empirical Bayes approach to testing. DEBT enhances this framework by learning a series of black box predictive models to boost power and control the false discovery rate. In Stage 1, it uses a deep neural network prior to report which experiments yielded significant outcomes. In Stage 2, it uses an empirical Bayes version of the knockoff filter to select covariates that have significant predictive power of Stage 1 significance. In both simulated and real data, DEBT increases the proportion of discovered significant outcomes and selects more features when signals are weak. In a real study of cancer cell lines, DEBT selects a robust set of biologically plausible genomic drivers of drug sensitivity and resistance in cancer.

[1]  Oluwasanmi Koyejo,et al.  False Discovery Rate Smoothing , 2014, Journal of the American Statistical Association.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  K W Schmid,et al.  MDM2 is an important prognostic and predictive factor for platin–pemetrexed therapy in malignant pleural mesotheliomas and deregulation of P14/ARF (encoded by CDKN2A) seems to contribute to an MDM2-driven inactivation of P53 , 2015, British Journal of Cancer.

[4]  Aaditya Ramdas,et al.  On the power of conditional independence testing under model-X , 2020, Electronic Journal of Statistics.

[5]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[6]  B. Efron Robbins, Empirical Bayes, And Microarrays , 2001 .

[7]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[8]  D. Freedman A Note on Screening Regression Equations , 1983 .

[9]  Ying Liu,et al.  Auto-Encoding Knockoff Generator for FDR Controlled Variable Selection , 2018, 1809.10765.

[10]  Il-Jin Kim,et al.  FBXW7 Targets mTOR for Degradation and Cooperates with PTEN in Tumor Suppression , 2008, Science.

[11]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[12]  R. Tibshirani,et al.  Using specially designed exponential families for density estimation , 1996 .

[13]  Haoran Zhang,et al.  The Holdout Randomization Test: Principled and Easy Black Box Feature Selection , 2018, 1811.00645.

[14]  J. Ghosh,et al.  CONSISTENCY OF A RECURSIVE ESTIMATE OF MIXING DISTRIBUTIONS , 2009, 0908.3418.

[15]  Burak Okumus,et al.  Corrigendum: Mechanical slowing-down of cytoplasmic diffusion allows in vivo counting of proteins in individual cells , 2016, Nature Communications.

[16]  Robert Nadon,et al.  Identification and correction of spatial bias are essential for obtaining quality data in high-throughput screening technologies , 2017, Scientific Reports.

[17]  Reinhard Büttner,et al.  The tumour suppressor CYLD regulates the p53 DNA damage response , 2016, Nature Communications.

[18]  Aaditya Ramdas,et al.  The leave-one-covariate-out conditional randomization test , 2020 .

[19]  Ang Li,et al.  Accumulation Tests for FDR Control in Ordered Hypothesis Testing , 2015, 1505.07352.

[20]  Emanuel J. V. Gonçalves,et al.  A Landscape of Pharmacogenomic Interactions in Cancer , 2016, Cell.

[21]  Bradley Efron,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Rejoinder. , 2008, 0808.0572.

[22]  Alberto Gandolfi,et al.  The Pharmacodynamics of the p53-Mdm2 Targeting Drug Nutlin: The Role of Gene-Switching Noise , 2014, PLoS Comput. Biol..

[23]  Lucas Janson,et al.  Fast and powerful conditional randomization testing via distillation. , 2020, Biometrika.

[24]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[25]  Bradley Efron,et al.  Local False Discovery Rates , 2005 .

[26]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[27]  Chiara Sabatti,et al.  Causal inference in genetic trio studies , 2020, Proceedings of the National Academy of Sciences.

[28]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[29]  Fei Xia,et al.  NeuralFDR: Learning Discovery Thresholds from Hypothesis Features , 2017, NIPS.

[30]  M. Newton On a nonparametric recursive estimator of the mixing distribution , 2002 .

[31]  L. Stefanski,et al.  Approved by: Project Leader Approved by: LCG Project Leader Prepared by: Project Manager Prepared by: LCG Project Manager Reviewed by: Quality Assurance Manager , 2004 .

[32]  David M. Blei,et al.  Black Box FDR , 2018, ICML.

[33]  Michael I. Jordan,et al.  A unified treatment of multiple testing with prior knowledge using the p-filter , 2017, The Annals of Statistics.

[34]  James G. Scott,et al.  False Discovery Rate Regression: An Application to Neural Synchrony Detection in Primary Visual Cortex , 2013, Journal of the American Statistical Association.

[35]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[36]  Michael A. Dyer,et al.  Inactivation of the p53 pathway in retinoblastoma , 2006, Nature.

[37]  R. Weinberg,et al.  The Biology of Cancer , 2006 .

[38]  William Fithian,et al.  AdaPT: an interactive procedure for multiple testing with side information , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[39]  Bradley Efron,et al.  Bayes, Oracle Bayes and Empirical Bayes , 2019, Statistical Science.