Selection Bias Correction and Eect Size Estimation under Dependence

We consider large-scale studies in which it is of interest to test a very large number of hypotheses, and then to estimate the eect sizes corresponding to the rejected hypotheses. For instance, this setting arises in the analysis of gene expression or DNA sequencing data. However, naive estimates of the eect sizes suer from selection bias, i.e., some of the largest naive estimates are large due to chance alone. Many authors have proposed methods to reduce the eects of selection bias under the assumption that the naive estimates of the eect sizes are independent. Unfortunately, when the eect size estimates are dependent, these existing techniques can have very poor performance, and in practice there will often be dependence. We propose an estimator that adjusts for selection bias under a recently-proposed frequentist framework, without the independence assumption. We study some properties of the proposed estimator, and illustrate that it outperforms past proposals in a simulation study and on two gene expression data sets.

[1]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[2]  Hongyu Zhao,et al.  Empirical Bayes Correction for the Winner's Curse in Genetic Association Studies , 2013, Genetic epidemiology.

[3]  S. Senn A Note Concerning a Selection “Paradox” of Dawid's , 2008 .

[4]  A. Owen Variance of the number of false discoveries , 2005 .

[5]  Shelley B. Bull,et al.  BR-squared: a practical solution to the winner’s curse in genome-wide scans , 2011, Human Genetics.

[6]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[7]  Noah Simon,et al.  On Estimating Many Means, Selection Bias, and the Bootstrap , 2013, 1311.3709.

[8]  A. P. Dawid,et al.  Selection paradoxes of Bayesian inference , 1994 .

[9]  David R Bickel,et al.  Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression , 2010, Statistical applications in genetics and molecular biology.

[10]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[11]  Stefan Wager A Geometric Approach to Density Estimation with Additive Noise , 2014 .

[12]  R. Prentice,et al.  Correcting “winner's curse” in odds ratios from genomewide association findings for major complex human diseases , 2009, Genetic epidemiology.

[13]  Yoav Benjamini,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Comment. , 2008 .

[14]  G. P. Steck,et al.  Moments of Order Statistics from the Equicorrelated Multivariate Normal Distribution , 1962 .

[15]  B. Efron Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[16]  A. McNeil Multivariate t Distributions and Their Applications , 2006 .

[17]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[18]  L. Tanoue Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2009 .

[19]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[20]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[21]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[22]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[23]  Radu V. Craiu,et al.  Bayesian methods to overcome the winner’s curse in genetic studies , 2009, 0907.2770.

[24]  B. Efron Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates , 2010, Journal of the American Statistical Association.

[25]  Bradley Efron,et al.  Large-scale inference , 2010 .