False Discovery Rate Smoothing

ABSTRACT We present false discovery rate (FDR) smoothing, an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. FDR smoothing automatically finds spatially localized regions of significant test statistics. It then relaxes the threshold of statistical significance within these regions, and tightens it elsewhere, in a manner that controls the overall false discovery rate at a given level. This results in increased power and cleaner spatial separation of signals from noise. The approach requires solving a nonstandard high-dimensional optimization problem, for which an efficient augmented-Lagrangian algorithm is presented. In simulation studies, FDR smoothing exhibits state-of-the-art performance at modest computational cost. In particular, it is shown to be far more robust than existing methods for spatially dependent multiple testing. We also apply the method to a dataset from an fMRI experiment on spatial working memory, where it detects patterns that are much more biologically plausible than those detected by standard FDR-controlling methods. All code for FDR smoothing is publicly available in Python and R (https://github.com/tansey/smoothfdr). Supplementary materials for this article are available online.

[1]  Nicholas A. Johnson,et al.  A Dynamic Programming Algorithm for the Fused Lasso and L 0-Segmentation , 2013 .

[2]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[3]  Kenneth Rice,et al.  FDR and Bayesian Multiple Comparisons Rules , 2006 .

[4]  J. Ghosh,et al.  A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing , 2008, 0805.2479.

[5]  Timothy O. Laumann,et al.  Methods to detect, characterize, and remove motion artifact in resting state fMRI , 2014, NeuroImage.

[6]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  P. Hall,et al.  Robustness of multiple testing procedures against dependence , 2009, 0903.0464.

[9]  Stephen P. Boyd,et al.  An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems , 2012, 1203.1828.

[10]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[11]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[12]  P. Müller,et al.  A Bayesian mixture model for differential gene expression , 2005 .

[13]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[14]  Antonin Chambolle,et al.  On Total Variation Minimization and Surface Evolution Using Parametric Maximum Flows , 2009, International Journal of Computer Vision.

[15]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[16]  Wenguang Sun,et al.  Large‐scale multiple testing under dependence , 2009 .

[17]  Thomas E. Nichols,et al.  Validating cluster size inference: random field and permutation methods , 2003, NeuroImage.

[18]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[19]  R. Dougherty,et al.  FALSE DISCOVERY RATE ANALYSIS OF BRAIN DIFFUSION DIRECTION MAPS. , 2008, The annals of applied statistics.

[20]  Bradley Efron,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Rejoinder. , 2008, 0808.0572.

[21]  Kathryn M. McMillan,et al.  N‐back working memory paradigm: A meta‐analysis of normative functional neuroimaging studies , 2005, Human brain mapping.

[22]  J. Ghosh,et al.  CONSISTENCY OF A RECURSIVE ESTIMATE OF MIXING DISTRIBUTIONS , 2009, 0908.3418.

[23]  Wenguang Sun,et al.  False discovery control in large‐scale spatial multiple testing , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[24]  Jeffrey T Leek,et al.  A general framework for multiple testing dependence , 2008, Proceedings of the National Academy of Sciences.

[25]  R. Poldrack Region of interest analysis for fMRI. , 2007, Social cognitive and affective neuroscience.

[26]  M. Newton On a nonparametric recursive estimator of the mixing distribution , 2002 .

[27]  Jeffrey T Leek,et al.  Significance analysis and statistical dissection of variably methylated regions. , 2012, Biostatistics.

[28]  I. Verdinelli,et al.  False Discovery Control for Random Fields , 2004 .

[29]  Suvrit Sra,et al.  Fast Newton-type Methods for Total Variation Regularization , 2011, ICML.

[30]  Y. Benjamini,et al.  False Discovery Rates for Spatial Signals , 2007 .

[31]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[32]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[33]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[34]  James G. Scott,et al.  Local shrinkage rules, Lévy processes and regularized regression , 2010, 1010.3390.

[35]  Bradley Efron,et al.  Large-scale inference , 2010 .

[36]  James G. Scott,et al.  False Discovery Rate Regression: An Application to Neural Synchrony Detection in Primary Visual Cortex , 2013, Journal of the American Statistical Association.

[37]  B. Efron SIMULTANEOUS INFERENCE : WHEN SHOULD HYPOTHESIS TESTING PROBLEMS BE COMBINED? , 2008, 0803.3863.

[38]  Tao Yu,et al.  MULTIPLE TESTING VIA FDRL FOR LARGE SCALE IMAGING DATA , 2011, 1103.1966.

[39]  Thomas E. Nichols,et al.  Handbook of Functional MRI Data Analysis: Index , 2011 .

[40]  Russell A. Poldrack,et al.  Handbook of Functional MRI Data Analysis: Visualizing, localizing, and reporting fMRI data , 2011 .

[41]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[42]  J. Berger,et al.  Testing Precise Hypotheses , 1987 .

[43]  Tao Yu,et al.  MULTIPLE TESTING VIA FDRL FOR LARGE SCALE IMAGING DATA , 2011 .

[44]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[45]  Hans Knutsson,et al.  Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates , 2016, Proceedings of the National Academy of Sciences.

[46]  Ryan Martin,et al.  A nonparametric empirical Bayes framework for large-scale multiple testing. , 2011, Biostatistics.

[47]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[48]  Nancy Kanwisher,et al.  Broad domain generality in focal regions of frontal and parietal cortex , 2013, Proceedings of the National Academy of Sciences.

[49]  Steen Moeller,et al.  Multiband multislice GE‐EPI at 7 tesla, with 16‐fold acceleration using partial parallel imaging with application to high spatial and temporal whole‐brain fMRI , 2010, Magnetic resonance in medicine.

[50]  Bin Nan,et al.  Multiple testing for neuroimaging via hidden Markov random field , 2014, Biometrics.

[51]  Thomas E. Nichols Multiple testing corrections, nonparametric methods, and random field theory , 2012, NeuroImage.