The Use of Multiplicity Corrections, Order Statistics and Generalized Family-Wise Statistics with Application to Genome-Wide Studies

The most important decision faced by large-scale studies, such as those presently encountered in human genetics, is to distinguish between those tests that are true positives from those that are not. In the context of genetics, this entails the determination of genetic markers that actually underlie medically-relevant phenotypes from a vast number of makers typically interrogated in genome-wide studies. A critical part of these decisions relies on the appropriate statistical assessment of data obtained from tests across numerous markers. Several methods have been developed to aid with such analyses, with family-wise approaches, such as the Bonferroni and Dunn-Šidàk corrections, being popular. Conditions that motivate the use of family-wise corrections are explored. Although simple to implement, one major limitation of these approaches is that they assume that p-values are i.i.d. uniformly distributed under the null hypothesis. However, several factors may violate this assumption in genome-wide studies including effects from confounding by population stratification, the presence of related individuals, the correlational structure among genetic markers, and the use of limiting distributions for test statistics. Even after adjustment for such effects, the distribution of p-values can substantially depart from a uniform distribution under the null hypothesis. In this work, I present a decision theory for the use of family-wise corrections for multiplicity and a generalization of the Dunn-Šidàk correction that relaxes the assumption of uniformly-distributed null p-values. The independence assumption is also relaxed and handled through calculating the effective number of independent tests. I also explicitly show the relationship between order statistics and family-wise correction procedures. This generalization may be applicable to multiplicity problems outside of genomics.

[1]  On an asymptotic goodness-of-fit test for a two-parameter gamma-distribution , 1996 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  V. Moskvina,et al.  On multiple‐testing correction in genome‐wide association studies , 2008, Genetic epidemiology.

[4]  Eden R Martin,et al.  A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms , 2008, Genetic epidemiology.

[5]  J. Li,et al.  Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix , 2005, Heredity.

[6]  John P A Ioannidis,et al.  What Should the Genome-wide Significance Threshold Be? Empirical Replication of Borderline Genetic Associations Yfor a Full List of Investigators Offering Data and Clarifications See Acknowledgments , 2022 .

[7]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[8]  D C Rao,et al.  Trade‐off between false positives and false negatives in the linkage analysis of complex traits , 1997, Genetic epidemiology.

[9]  M. J. Bayarri,et al.  Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[10]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[11]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[12]  Ashutosh Kumar Singh,et al.  Properties of Estimators for the Gamma Distribution , 1990 .

[13]  L. R. Shenton,et al.  Properties of Estimators for the Gamma Distribution , 1987 .

[14]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[15]  Jon Wakefield,et al.  Reporting and interpretation in genome-wide association studies. , 2008, International journal of epidemiology.

[16]  C. Amos,et al.  SNP characteristics predict replication success in association studies , 2014, Human Genetics.

[17]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[18]  Young Sook Son,et al.  Bayesian Estimation of the Two-Parameter Gamma Distribution , 2006 .

[19]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[20]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[21]  Jon Wakefield,et al.  Commentary: Genome-wide significance thresholds via Bayes factors. , 2012, International journal of epidemiology.

[22]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[23]  B. Sorić Statistical “Discoveries” and Effect-Size Estimation , 1989 .

[24]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[25]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[26]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[28]  John D. Storey A direct approach to false discovery rates , 2002 .

[29]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[30]  Christian Gieger,et al.  Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture , 2013, Nature Genetics.

[31]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[32]  Steven J. Schrodi,et al.  Statistical Applications in Genetics and Molecular Biology A Probabilistic Approach to Large-Scale Association Scans : A Semi-Bayesian Method to Detect Disease-Predisposing Alleles , 2011 .

[33]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[34]  Qingzhong Liu,et al.  A New Approach to Account for the Correlations among Single Nucleotide Polymorphisms in Genome-Wide Association Studies , 2011, Human Heredity.

[35]  Xiaoyi Gao Multiple testing corrections for imputed SNPs , 2011, Genetic epidemiology.

[36]  O. J. Dunn On multiple tests and confidence intervals , 1974 .

[37]  J. Cheverud,et al.  A simple correction for multiple comparisons in interval mapping genome scans , 2001, Heredity.