On Estimating Many Means, Selection Bias, and the Bootstrap

AbstractWithrecentadvancesinhighthroughputtechnology,researchersoftenfindthemselvesrunningalargenumberofhypothesistests(thousands+)andesti-matinga largenumber ofeffect-sizes. Generallythereis particular interest inthose effects estimated to be most extreme. Unfortunately naive estimates oftheseeffect-sizes(evenafterpotentiallyaccountingformultiplicityinatestingprocedure)canbeseverelybiased. Inthismanuscriptweexplorethisbiasfromafrequentistperspective: wegiveaformaldefinition,andshowthatanoracleestimator using this bias dominates the naive maximum likelihood estimate.We give a resampling estimator to approximate this oracle, and show that itworkswellonsimulateddata. WealsoconnectthistoideasinempiricalBayes.Keywords: bootstrap, shrinkage, mean, empirical Bayes, James-Stein, regres-siontothemean,selectionbias,compounddecisiontheory 1 Introduction Often, in modern applications, researchers are interested in testing and estimatingeffectsizesformanydifferentfeaturesatonce. Inthesimplestcasesoneisinterested