False Discovery Rate Control Under Reduced Precision Computation

The mitigation of false positives is an important issue when conducting multiple hypothesis testing. The most popular paradigm for false positives mitigation in high-dimensional applications is via the control of the false discovery rate (FDR). Multiple testing data from neuroimaging experiments can be very large, and reduced precision storage of such data is often required. We present a method for FDR control that is applicable in cases where only p\text{-values} or test statistics (with common and known null distribution) are available, and when those p\text{-values} or test statistics are encoded in a reduced precision format. Our method is based on an empirical-Bayes paradigm where the probit transformation of the p\text{-values} (called the z\text{-scores}) are modeled as a two-component mixture of normal distributions. Due to the reduced precision of the p\text{-values} or test statistics, the usual approach for fitting mixture models may not be feasible. We instead use a binned-data technique, which can be proved to consistently estimate the z\text{-score} distribution parameters under mild correlation assumptions, as is often the case in neuroimaging data. A simulation study shows that our methodology is competitive when compared with popular alternatives, especially with data in the presence of misspecification. We demonstrate the applicability of our methodology in practice via a brain imaging study of mice.

[1]  Koen V. Haak,et al.  Thresholding functional connectomes by means of mixture modeling , 2018, NeuroImage.

[2]  E. Juratovac,et al.  Age-Related Changes , 2017 .

[3]  Trevor Hastie,et al.  Computer Age Statistical Inference: Algorithms, Evidence, and Data Science , 2016 .

[4]  Christopher Rorden,et al.  The first step for neuroimaging data analysis: DICOM to NIfTI conversion , 2016, Journal of Neuroscience Methods.

[5]  Angelo Bifone,et al.  Structural covariance networks in the mouse brain , 2016, NeuroImage.

[6]  Jessica A. Turner,et al.  Sharing the wealth: Neuroimaging data repositories , 2016, NeuroImage.

[7]  J. Fudge,et al.  Resting state connectivity of the bed nucleus of the stria terminalis at ultra‐high field , 2015, Human brain mapping.

[8]  Brenton W. McMenamin,et al.  Discovering networks altered by potential threat (“anxiety”) using quadratic discriminant analysis , 2015, NeuroImage.

[9]  Johan Schoukens,et al.  Information and Statistical Efficiency When Quantizing Noisy DC Values , 2015, IEEE Transactions on Instrumentation and Measurement.

[10]  Alan C. Evans,et al.  Disruption of structural covariance networks for language in autism is modulated by verbal ability , 2014, Brain Structure and Function.

[11]  R M Henkelman,et al.  Clustering autism: using neuroanatomical differences in 26 mouse models to gain insight into the heterogeneity , 2014, Molecular Psychiatry.

[12]  A. Wheeler,et al.  A review of structural neuroimaging in schizophrenia: from connectivity to connectomics , 2014, Front. Hum. Neurosci..

[13]  Andrew L. Janke,et al.  False Discovery Rate Control in Magnetic Resonance Imaging Studies via Markov Random Fields , 2014, IEEE Transactions on Medical Imaging.

[14]  Thorsten Dickhaus,et al.  Simultaneous Statistical Inference: With Applications in the Life Sciences , 2014 .

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Michele Larobina,et al.  Medical Image File Formats , 2014, Journal of Digital Imaging.

[17]  Alan C. Evans Networks of anatomical covariance , 2013, NeuroImage.

[18]  Hani Hamdan,et al.  Model choice for binned-EM algorithms of fourteen parsimonious Gaussian mixture models by BIC and ICL criteria , 2013, 2013 International Conference on System Science and Engineering (ICSSE).

[19]  Hani Hamdan,et al.  Bin-EM-CEM algorithms of spherical parsimonious Gaussian mixture models for binned data clustering , 2013, 2013 IEEE 17th International Conference on Intelligent Engineering Systems (INES).

[20]  E. Bullmore,et al.  Imaging structural co-variance between human brain regions , 2013, Nature Reviews Neuroscience.

[21]  Shuyu Li,et al.  Age-related changes in brain structural covariance networks , 2013, Front. Hum. Neurosci..

[22]  Gyemin Lee,et al.  EM algorithms for multivariate Gaussian mixture models with truncated and censored data , 2012, Comput. Stat. Data Anal..

[23]  Volodymyr Melnykov,et al.  Initializing the EM algorithm in Gaussian mixture models with an unknown number of components , 2012, Comput. Stat. Data Anal..

[24]  J. Lerch,et al.  Patterns of Coordinated Anatomical Change in Human Cortical Development: A Longitudinal Neuroimaging Study of Maturational Coupling , 2011, Neuron.

[25]  Edsel A Peña,et al.  Randomised P-values and nonparametric procedures in multiple testing , 2011, Journal of nonparametric statistics.

[26]  R Mark Henkelman,et al.  MRI phenotyping of genetically altered mice. , 2011, Methods in molecular biology.

[27]  Hongzhe Li,et al.  Optimal False Discovery Rate Control for Dependent Data. , 2011, Statistics and its interface.

[28]  Y. Benjamini Discovering the false discovery rate , 2010 .

[29]  Bradley Efron,et al.  Large-scale inference , 2010 .

[30]  Allou Samé Grouped data clustering using a fast mixture-model-based algorithm , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[31]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[32]  CM Bennett,et al.  Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: an argument for multiple comparisons correction , 2009, NeuroImage.

[33]  B. Miller,et al.  Neurodegenerative Diseases Target Large-Scale Human Brain Networks , 2009, Neuron.

[34]  C. Varin On composite marginal likelihoods , 2008 .

[35]  S. Dudoit,et al.  Multiple Testing Procedures with Applications to Genomics , 2007 .

[36]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[37]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[38]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[39]  DanielYekutieli False discovery rate control for non-positively regression dependent test statistics , 2007 .

[40]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[41]  Geoffrey J. McLachlan,et al.  A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays , 2006, Bioinform..

[42]  Alan C. Evans,et al.  Mapping anatomical correlations across cerebral cortex (MACACC) using cortical thickness from MRI , 2006, NeuroImage.

[43]  Alan E. Hubbard,et al.  Statistical Applications in Genetics and Molecular Biology Quantile-Function Based Null Distribution in Resampling Based Multiple Testing , 2011 .

[44]  Larry W Swanson,et al.  Projections from bed nuclei of the stria terminalis, anteromedial area: Cerebral hemisphere integration of neuroendocrine, autonomic, and behavioral aspects of energy balance , 2006, The Journal of comparative neurology.

[45]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[46]  H. Barreto,et al.  Introductory Econometrics: Monte Carlo Simulation , 2005 .

[47]  R. Steele,et al.  Optimization , 2005, Encyclopedia of Biometrics.

[48]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[49]  Stephen B. Vardeman,et al.  Likelihood-based statistical estimation from quantized data , 2005, IEEE Transactions on Instrumentation and Measurement.

[50]  Mark J. van der Laan,et al.  Choice of a null distribution in resampling-based multiple testing , 2004 .

[51]  R. Simon,et al.  Controlling the number of false discoveries: application to high-dimensional genomic data , 2004 .

[52]  Geoffrey J. McLachlan,et al.  Maximum Likelihood Estimation of Mixture Densities for Binned and Truncated Multivariate Data , 2002, Machine Learning.

[53]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[54]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[56]  Dimitris Karlis,et al.  Choosing Initial Values for the EM Algorithm for Finite Mixtures , 2003, Comput. Stat. Data Anal..

[57]  John D. Storey A direct approach to false discovery rates , 2002 .

[58]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[59]  Thomas E. Nichols,et al.  Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate , 2002, NeuroImage.

[60]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[61]  Hong-wei Dong,et al.  Basic organization of projections from the oval and fusiform nuclei of the bed nuclei of the stria terminalis in adult rat brain , 2001, The Journal of comparative neurology.

[62]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[63]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[64]  M. Evans Statistical Distributions , 2000 .

[65]  Karl J. Friston,et al.  Voxel-Based Morphometry—The Methods , 2000, NeuroImage.

[66]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[67]  Alan C. Evans,et al.  A nonparametric method for automatic correction of intensity nonuniformity in MRI data , 1998, IEEE Transactions on Medical Imaging.

[68]  Steven C. Horii,et al.  Review: Understanding and Using DICOM, the Data Interchange Standard for Biomedical Imaging , 1997, J. Am. Medical Informatics Assoc..

[69]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[70]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[71]  Halbert White,et al.  Estimation, inference, and specification analysis , 1996 .

[72]  D. Andrews Generic Uniform Convergence , 1992, Econometric Theory.

[73]  Geoffrey J. McLachlan,et al.  Improving the convergence rate of the em algorithm for a mixture model fitted to grouped truncated data , 1992 .

[74]  R A Robb,et al.  Analyze: a comprehensive, operator-interactive software package for multidimensional medical image display and analysis. , 1989, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society.

[75]  G. McLachlan On the choice of starting values for the EM algorithm in fitting mixture models , 1988 .

[76]  G. McLachlan,et al.  Fitting mixture models to grouped and truncated data via the EM algorithm. , 1988, Biometrics.

[77]  R. C. Bradley Basic Properties of Strong Mixing Conditions , 1985 .

[78]  H. White Asymptotic theory for econometricians , 1985 .

[79]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[80]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[81]  D. Freedman,et al.  On the histogram as a density estimator:L2 theory , 1981 .

[82]  D. W. Scott On optimal and data based histograms , 1979 .

[83]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[84]  J. Wishart Probable Error , 1932, The Mathematical Gazette.

[85]  Herbert A. Sturges,et al.  The Choice of a Class Interval , 1926 .

[86]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[87]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .