Power comparisons between similarity‐based multilocus association methods, logistic regression, and score tests for haplotypes

Recently, a genomic distance‐based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792–806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case‐control analyses is not well known. We compare the power of traditional methods with this new distance‐based approach, for both locus‐scoring and haplotype‐scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high‐risk haplotype; (5) the correlation between the causal single‐nucleotide polymorphism (SNP) and its flanking markers. We found that locus‐based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance‐based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance‐based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.

[1]  Stacey S Cherny,et al.  The impact of genotyping error on family-based analysis of quantitative traits , 2001, European Journal of Human Genetics.

[2]  Lambertus Klei,et al.  Testing for association based on excess allele sharing in a sample of related cases and controls , 2007, Human Genetics.

[3]  G R Grant,et al.  Significance testing for direct identity‐by‐descent mapping , 1999, Annals of human genetics.

[4]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[5]  S. Nelson,et al.  Genomic mismatch scanning identifies human genomic DNA shared identical by descent. , 1998, Genomics.

[6]  M. Iles,et al.  Fine-scale mapping in case-control samples using locus scoring and haplotype-sharing methods , 2005, BMC Genetics.

[7]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[8]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[9]  E. Génin,et al.  Use of closely related affected individuals for the genetic study of complex diseases in founder populations. , 2001, American journal of human genetics.

[10]  Claire Bardel,et al.  Clustering of haplotypes based on phylogeny: how good a strategy for association testing? , 2006, European Journal of Human Genetics.

[11]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[12]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[13]  A. G. Heidema,et al.  The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases , 2006, BMC Genetics.

[14]  L. Wasserman,et al.  On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. , 2003, American journal of human genetics.

[15]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[16]  Daniel J Schaid,et al.  Relative efficiency of ambiguous vs. directly measured haplotype frequencies , 2002, Genetic epidemiology.

[17]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[18]  B Devlin,et al.  Genomic control for association studies: a semiparametric test to detect excess-haplotype sharing. , 2000, Biostatistics.

[19]  G. Abecasis,et al.  Using haplotype blocks to map human complex trait loci. , 2003, Trends in genetics : TIG.

[20]  T. Meerman,et al.  Haplotype sharing analysis in affected individuals from nuclear families with at least one affected offspring , 1997 .

[21]  Ao Yuan,et al.  Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test , 2006, Human Genetics.

[22]  K. Cheng,et al.  Simultaneously correcting for population stratification and for genotyping error in case-control association studies. , 2007, American journal of human genetics.

[23]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[24]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[25]  Peter H. Westfall,et al.  Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals , 2002, Human Heredity.

[26]  Qiuying Sha,et al.  A new association test using haplotype similarity , 2007, Genetic epidemiology.

[27]  T. Hänninen,et al.  Association of apolipoprotein E phenotypes with late onset Alzheimer's disease: population based study , 1994, BMJ.

[28]  Jung-Ying Tzeng,et al.  Regression-based multi-marker analysis for genome-wide association studies using haplotype similarity , 2007 .

[29]  Jung-Ying Tzeng,et al.  Haplotype-based association analysis via variance-components score test. , 2007, American journal of human genetics.

[30]  Nancy Role Mendell,et al.  Characteristics of replicated single-nucleotide polymorphism genotypes from COGA: Affymetrix and Center for Inherited Disease Research , 2005, BMC Genetics.

[31]  D. Schaid Evaluating associations of haplotypes with traits , 2004, Genetic epidemiology.

[32]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[33]  D. Clayton,et al.  A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. , 2002, American journal of human genetics.

[34]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[35]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[36]  D. Schaid Power and Sample Size for Testing Associations of Haplotypes with Complex Traits , 2006, Annals of human genetics.

[37]  Larry Wasserman,et al.  Outlier Detection and False Discovery Rates for Whole-Genome DNA Matching , 2003 .

[38]  N. Kaplan,et al.  Testing for association with a case‐parents design in the presence of genotyping errors , 2004, Genetic epidemiology.

[39]  Daniel J Schaid,et al.  The complex genetic epidemiology of prostate cancer. , 2004, Human molecular genetics.

[40]  Jason Cooper,et al.  Use of unphased multilocus genotype data in indirect association studies , 2004, Genetic epidemiology.

[41]  P. Sham,et al.  Investigation of the Ability of Haplotype Association and Logistic Regression to Identify Associated Susceptibility Loci , 2006, Annals of human genetics.

[42]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .