Computationally efficient permutation-based confidence interval estimation for tail-area FDR

Challenges of satisfying parametric assumptions in genomic settings with thousands or millions of tests have led investigators to combine powerful False Discovery Rate (FDR) approaches with computationally expensive but exact permutation testing. We describe a computationally efficient permutation-based approach that includes a tractable estimator of the proportion of true null hypotheses, the variance of the log of tail-area FDR, and a confidence interval (CI) estimator, which accounts for the number of permutations conducted and dependencies between tests. The CI estimator applies a binomial distribution and an overdispersion parameter to counts of positive tests. The approach is general with regards to the distribution of the test statistic, it performs favorably in comparison to other approaches, and reliable FDR estimates are demonstrated with as few as 10 permutations. An application of this approach to relate sleep patterns to gene expression patterns in mouse hypothalamus yielded a set of 11 transcripts associated with 24 h REM sleep [FDR = 0.15 (0.08, 0.26)]. Two of the corresponding genes, Sfrp1 and Sfrp4, are involved in wnt signaling and several others, Irf7, Ifit1, Iigp2, and Ifih1, have links to interferon signaling. These genes would have been overlooked had a typical a priori FDR threshold such as 0.05 or 0.1 been applied. The CI provides the flexibility for choosing a significance threshold based on tolerance for false discoveries and precision of the FDR estimate. That is, it frees the investigator to use a more data-driven approach to define significance, such as the minimum estimated FDR, an option that is especially useful for weak effects, often observed in studies of complex diseases.

[1]  B. Woolf ON ESTIMATING THE RELATION BETWEEN BLOOD GROUP AND DISEASE , 1955, Annals of human genetics.

[2]  Armin Schwartzman,et al.  Empirical null and false discovery rate inference for exponential families , 2008, 0901.4007.

[3]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[4]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. R. Andersson,et al.  SFRP1 and SFRP2 Dose‐Dependently Regulate Midbrain Dopamine Neuron Development In Vivo and in Embryonic Stem Cells , 2012, Stem cells.

[6]  Jeffrey T Leek,et al.  The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. , 2007, Biostatistics.

[7]  B. Efron Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates , 2010, Journal of the American Statistical Association.

[8]  Bradley Efron,et al.  Large-scale inference , 2010 .

[9]  Steven V. Fox,et al.  Pharmacological Validation of Candidate Causal Sleep Genes Identified in an N2 Cross , 2011, Journal of neurogenetics.

[10]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[11]  John D. Storey A direct approach to false discovery rates , 2002 .

[12]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[13]  Xihong Lin,et al.  The effect of correlation in false discovery rate estimation. , 2011, Biometrika.

[14]  A. Kasarskis,et al.  Identification of causal genes, networks, and transcriptional regulators of REM sleep and wake. , 2011, Sleep.

[15]  宮入 烈 The p47 GTPases Iigp2 and Irgb10 regulate innate immunity and inflammation to murine Chlamydia psittaci infection , 2013 .

[16]  Kenneth S. Koblan,et al.  Uncovering the Genetic Landscape for Multiple Sleep-Wake Traits , 2009, PloS one.

[17]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[18]  E. Cisneros,et al.  Beyond Wnt inhibition: new functions of secreted Frizzled-related proteins in development and disease , 2008, Journal of Cell Science.

[19]  E. Bonifacio,et al.  An Interferon-Induced Helicase (IFIH1) Gene Polymorphism Associates With Different Rates of Progression From Autoimmunity to Type 1 Diabetes , 2011, Diabetes.

[20]  O. Dekkers,et al.  Disturbed subjective sleep characteristics in adult patients with long-standing type 1 diabetes mellitus , 2011, Diabetologia.

[21]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[22]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[23]  Gary A. Churchill,et al.  Estimating p-values in small microarray experiments , 2007, Bioinform..

[24]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[25]  J. Krueger,et al.  Mice deficient in the interferon type I receptor have reduced REM sleep and altered hypothalamic hypocretin, prolactin and 2′,5′-oligoadenylate synthetase expression , 2004, Brain Research.

[26]  Johannes Schumacher,et al.  A systematic eQTL study of cis–trans epistasis in 210 HapMap individuals , 2011, European Journal of Human Genetics.

[27]  A. Edwards,et al.  The Meaning of Binomial Distribution , 1960, Nature.

[28]  Giovanni Montana,et al.  HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients , 2005, Bioinform..

[29]  A. Owen Variance of the number of false discoveries , 2005 .

[30]  Chen-An Tsai,et al.  Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data , 2003, Biometrics.

[31]  A. Kasarskis,et al.  Altered sleep and affect in the neurotensin receptor 1 knockout mouse. , 2012, Sleep.

[32]  Alessio Farcomeni,et al.  A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion , 2008, Statistical methods in medical research.

[33]  N. Inestrosa,et al.  Wnt signaling: Role in LTP, neural networks and memory , 2013, Ageing Research Reviews.