Testing Untyped Alleles (TUNA)—applications to genome‐wide association studies

The large number of tests performed in analyzing data from genome‐wide association studies has a large impact on the power of detecting risk variants, and analytic strategies specifying the optimal set of hypotheses to be tested are necessary. We propose a genome‐wide strategy that is based on one degree of freedom tests for all the genotyped variants, and for all the untyped variants for which there is sufficient information in the observed data. The set of untyped variants to be tested is found using multi‐locus measures of linkage disequilibrium and haplotype frequencies from a reference database such as HapMap (The International HapMap Consortium [2003] Nature 426:789–796). We introduce a novel statistic for testing differences in allele frequencies for untyped variation that is based on linear combinations of estimable haplotype frequencies. Algorithms for finding the sets of genotyped markers to be used in testing an untyped allele, and ways of incorporating haplotypes observed in the study data but not in the reference database are also described. The proposed testing strategy can be used as the first step in the analysis of genome‐wide association data, and, because every performed test is directed to a marker, it can be used to specify the set of polymorphisms to genotype in follow‐up studies. The described methodology provides also a tool for joint analysis of data from studies done on different platforms. Genet. Epidemiol. 2006.© 2006 Wiley‐Liss, Inc.

[1]  H. Akaike A Bayesian analysis of the minimum AIC procedure , 1978 .

[2]  Nicole Soranzo,et al.  A single-nucleotide polymorphism tagging set for human drug metabolism and transport , 2005, Nature Genetics.

[3]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[4]  Eric Boerwinkle,et al.  Determinants of the success of whole-genome association testing. , 2005, Genome research.

[5]  Xiaoquan Wen,et al.  Coverage and Characteristics of the Affymetrix GeneChip Human Mapping 100K SNP Set , 2006, PLoS genetics.

[6]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[7]  Daniel O Stram,et al.  Tag SNP selection for association studies , 2004, Genetic epidemiology.

[8]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[9]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[10]  Nicholas W Wood,et al.  Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. , 2003, Trends in genetics : TIG.

[11]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[12]  D J Schaid,et al.  Biased tests of association: comparisons of allele frequencies when departing from Hardy-Weinberg proportions. , 1999, American journal of epidemiology.

[13]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[14]  M. Willing,et al.  Premature chain termination is a unifying mechanism for COL1A1 null alleles in osteogenesis imperfecta type I cell strains. , 1996, American journal of human genetics.

[15]  Aravinda Chakravarti,et al.  Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies , 2004, Nature Genetics.

[16]  S. P. Fodor,et al.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays , 2004, Nature Methods.

[17]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[18]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[19]  Jacqueline K. Wittke-Thompson,et al.  Rational inferences about departures from Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[20]  Christopher A. Haiman,et al.  Choosing Haplotype-Tagging SNPS Based on Unphased Genotype Data Using a Preliminary Sample of Unrelated Subjects with an Example from the Multiethnic Cohort Study , 2003, Human Heredity.

[21]  D. Balding,et al.  Fine mapping of disease genes via haplotype clustering , 2006, Genetic epidemiology.

[22]  H. Akaike A new look at the statistical model identification , 1974 .

[23]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[24]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[25]  B. Efron,et al.  Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .