论文信息 - Selection Bias Correction and Eect Size Estimation under Dependence - 字舞流文

Selection Bias Correction and Eect Size Estimation under Dependence

We consider large-scale studies in which it is of interest to test a very large number of hypotheses, and then to estimate the eect sizes corresponding to the rejected hypotheses. For instance, this setting arises in the analysis of gene expression or DNA sequencing data. However, naive estimates of the eect sizes suer from selection bias, i.e., some of the largest naive estimates are large due to chance alone. Many authors have proposed methods to reduce the eects of selection bias under the assumption that the naive estimates of the eect sizes are independent. Unfortunately, when the eect size estimates are dependent, these existing techniques can have very poor performance, and in practice there will often be dependence. We propose an estimator that adjusts for selection bias under a recently-proposed frequentist framework, without the independence assumption. We study some properties of the proposed estimator, and illustrate that it outperforms past proposals in a simulation study and on two gene expression data sets.

Noah Simon | Kean Ming Tan | Daniela Witten | D. Witten | N. Simon

[1] Sandrine Dudoit,et al. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[2] Hongyu Zhao,et al. Empirical Bayes Correction for the Winner's Curse in Genetic Association Studies , 2013, Genetic epidemiology.

[3] S. Senn. A Note Concerning a Selection “Paradox” of Dawid's , 2008 .

[4] A. Owen. Variance of the number of false discoveries , 2005 .

[5] Shelley B. Bull,et al. BR-squared: a practical solution to the winner’s curse in genome-wide scans , 2011, Human Genetics.

[6] Dennis B. Troup,et al. NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[7] Noah Simon,et al. On Estimating Many Means, Selection Bias, and the Bootstrap , 2013, 1311.3709.

[8] A. P. Dawid,et al. Selection paradoxes of Bayesian inference , 1994 .

[9] David R Bickel,et al. Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression , 2010, Statistical applications in genetics and molecular biology.

[10] E. Lander,et al. Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[11] Stefan Wager. A Geometric Approach to Density Estimation with Additive Noise , 2014 .

[12] R. Prentice,et al. Correcting “winner's curse” in odds ratios from genomewide association findings for major complex human diseases , 2009, Genetic epidemiology.

[13] Yoav Benjamini,et al. Microarrays, Empirical Bayes and the Two-Groups Model. Comment. , 2008 .

[14] G. P. Steck,et al. Moments of Order Statistics from the Equicorrelated Multivariate Normal Distribution , 1962 .

[15] B. Efron. Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[16] A. McNeil. Multivariate t Distributions and Their Applications , 2006 .

[17] W. Huber,et al. which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[18] L. Tanoue. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2009 .

[19] B. Efron. Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[20] Isaac Dialsingh,et al. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[21] S. Dudoit,et al. Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[22] H. Robbins. An Empirical Bayes Approach to Statistics , 1956 .

[23] Radu V. Craiu,et al. Bayesian methods to overcome the winner’s curse in genetic studies , 2009, 0907.2770.

[24] B. Efron. Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates , 2010, Journal of the American Statistical Association.

[25] Bradley Efron,et al. Large-scale inference , 2010 .