Comparison of association methods for dense marker data

While data sets based on dense genome scans are becoming increasingly common, there are many theoretical questions that remain unanswered. How can a large number of markers in high linkage disequilibrium (LD) and rare disease variants be simulated efficiently? How should markers in high LD be analyzed: individually or jointly? Are there fast and simple methods to adjust for correlation of tests? What is the power penalty for conservative Bonferroni adjustments? Assuming that association scans are adequately powered, we attempt to answer these questions. Performance of single‐point and multipoint tests, and their hybrids, is investigated using two simulation designs. The first simulation design uses theoretically derived LD patterns. The second design uses LD patterns based on real data. For the theoretical simulations we used polychoric correlation as a measure of LD to facilitate simulation of markers in LD and rare disease variants. Based on the simulation results of the two studies, we conclude that statistical tests assuming only additive genotype effects (i.e. Armitage and especially multipoint T2) should be used cautiously due to their suboptimal power in certain settings. A false discovery rate (FDR)‐adjusted combination of tests for additive, dominant and recessive effects had close to optimal power. However, the common genotypic χ2 test performed adequately and could be used in lieu of the FDR combination. While some hybrid methods yield (sometimes spectacularly) higher power they are computationally intensive. We also propose an “exact” method to adjust for multiple testing, which yields nominally higher power than the Bonferroni correction. Genet. Epidemiol. 2008. © 2008 Wiley‐Liss, Inc.

[1]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[2]  J. Hirschhorn,et al.  Genetic model testing and statistical power in population‐based association studies of quantitative traits , 2007, Genetic epidemiology.

[3]  Ruzong Fan,et al.  Genome association studies of complex diseases by case-control designs. , 2003, American journal of human genetics.

[4]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[5]  Momiao Xiong,et al.  Generalized T2 test for genome association studies. , 2002, American journal of human genetics.

[6]  John D. Storey A direct approach to false discovery rates , 2002 .

[7]  Patrick F Sullivan,et al.  False discoveries and models for gene discovery. , 2003, Trends in genetics : TIG.

[8]  R. Shamir,et al.  A fast method for computing high-significance disease association in large population-based studies. , 2006, American journal of human genetics.

[9]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[10]  Ulf Olsson,et al.  Maximum likelihood estimation of the polychoric correlation coefficient , 1979 .

[11]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[12]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[14]  Robert Tibshirani,et al.  Statistical Significance for Genome-Wide Experiments , 2003 .

[15]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[16]  P. Sasieni From genotypes to genes: doubling the sample size. , 1997, Biometrics.

[17]  Kathryn Roeder,et al.  Analysis of single‐locus tests to detect gene/disease associations , 2005, Genetic epidemiology.

[18]  D. Duggan,et al.  Recent developments in genomewide association scans: a workshop summary and review. , 2005, American journal of human genetics.

[19]  Alessandro Rinaldo,et al.  Characterization of multilocus linkage disequilibrium , 2005, Genetic epidemiology.

[20]  Ruzong Fan,et al.  High-Resolution Association Mapping of Quantitative Trait Loci: A Population-Based Approach , 2006, Genetics.

[21]  J. Longmate,et al.  Complexity and power in case-control association studies. , 2001, American journal of human genetics.

[22]  J. Haines,et al.  Serum Lipids in the GENECARD Study of Coronary Artery Disease Identify Quantitative Trait Loci and Phenotypic Subsets on Chromosomes 3q and 5q , 2006, Annals of human genetics.

[23]  A. Genz,et al.  Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts , 1999 .

[24]  M. Knapp,et al.  Sibship T2 association tests of complex diseases for tightly linked markers , 2005, Human Genomics.

[25]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.