Improved Statistics for Genome-Wide Interaction Analysis

Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.

[1]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[2]  D. Clayton Prediction and Interaction in Complex Disease Genetics: Experience in Type 1 Diabetes , 2009, PLoS genetics.

[3]  C R Weinberg,et al.  Choosing a retrospective design to assess joint genetic and environmental contributions to risk. , 2000, American journal of epidemiology.

[4]  W. G. Hill,et al.  Genome partitioning of genetic variation for complex traits using common SNPs , 2011, Nature Genetics.

[5]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[6]  D. Thomas,et al.  Biological models and statistical interactions: an example from multistage carcinogenesis. , 1981, International journal of epidemiology.

[7]  Momiao Xiong,et al.  A Novel Statistic for Genome-Wide Interaction Analysis , 2010, PLoS genetics.

[8]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[9]  L. Peltonen,et al.  Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis , 2011, Nature Genetics.

[10]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.

[13]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[14]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[15]  Peter Kraft,et al.  Using principal components of genetic variation for robust and powerful detection of gene-gene interactions in case-control and case-only studies. , 2010, American journal of human genetics.

[16]  W. Willett,et al.  Large-scale exploration of gene-gene interactions in prostate cancer using a multistage genome-wide association study. , 2011, Cancer research.

[17]  A. Ziegler,et al.  A Genotype-Based Approach to Assessing the Association between Single Nucleotide Polymorphisms , 2008, Human Heredity.

[18]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[19]  B. Schölkopf,et al.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units , 2011, European Journal of Human Genetics.

[20]  W. Gauderman Sample size requirements for association studies of gene-gene interaction. , 2002, American journal of epidemiology.

[21]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[22]  Jack A. Taylor,et al.  Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. , 1994, Statistics in medicine.

[23]  W. Thompson,et al.  Effect modification and the limits of biological inference from epidemiologic data. , 1991, Journal of clinical epidemiology.

[24]  Bhramar Mukherjee,et al.  Exploiting Gene‐Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes‐Type Shrinkage Estimator to Trade‐Off between Bias and Efficiency , 2008, Biometrics.

[25]  R. Elston,et al.  The Meaning of Interaction , 2010, Human Heredity.

[26]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[27]  P. Sasieni From genotypes to genes: doubling the sample size. , 1997, Biometrics.

[28]  A. Brown,et al.  Sample sizes required to detect linkage disequilibrium between two or three loci. , 1975, Theoretical population biology.

[29]  Peter Kraft,et al.  Exploiting Gene-Environment Interaction to Detect Genetic Associations , 2007, Human Heredity.

[30]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[31]  W D Flanders,et al.  Case-only design to measure gene-gene interaction. , 1999, Epidemiology.

[32]  E. J. van den Oord,et al.  Variance component analysis of polymorphic metabolic systems. , 2006, Journal of theoretical biology.

[33]  P. Phillips The language of gene interaction. , 1998, Genetics.

[34]  R. A. Bailey,et al.  Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes , 2007, Nature Genetics.

[35]  Dmitri V Zaykin,et al.  Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. , 2006, American journal of human genetics.

[36]  R. Lewontin,et al.  On measures of gametic disequilibrium. , 1988, Genetics.

[37]  Juliet M Chapman,et al.  Detecting association using epistatic information , 2007, Genetic epidemiology.

[38]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[39]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.