ATOM: a powerful gene-based association test by combining optimally weighted markers

BACKGROUND Large-scale candidate-gene and genome-wide association studies genotype multiple SNPs within or surrounding a gene, including both tag and functional SNPs. The immense amount of data generated in these studies poses new challenges to analysis. One particularly challenging yet important question is how to best use all genetic information to test whether a gene or a region is associated with the trait of interest. METHODS Here we propose a powerful gene-based Association Test by combining Optimally Weighted Markers (ATOM) within a genomic region. Due to variation in linkage disequilibrium, different markers often associate with the trait of interest at different levels. To appropriately apportion their contributions, we assign a weight to each marker that is proportional to the amount of information it captures about the trait locus. We analytically derive the optimal weights for both quantitative and binary traits, and describe a procedure for estimating the weights from a reference database such as the HapMap. Compared with existing approaches, our method has several distinct advantages, including (i) the ability to borrow information from an external database to increase power, (ii) the theoretical derivation of optimal marker weights and (iii) the scalability to simultaneous analysis of all SNPs in candidate genes and pathways. RESULTS Through extensive simulations and analysis of the FTO gene in our ongoing genome-wide association study on childhood obesity, we demonstrate that ATOM increases the power to detect genetic association as compared with several commonly used multi-marker association tests.

[1]  Kai Wang,et al.  A principal components regression approach to multilocus genetic association studies , 2008, Genetic epidemiology.

[2]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.

[3]  Struan F. A. Grant,et al.  Association Analysis of the FTO Gene with Obesity in Children of Caucasian and African Ancestry Reveals a Common Tagging SNP , 2008, PloS one.

[4]  G. Satten,et al.  Inference on haplotype effects in case-control studies using unphased genotype data. , 2003, American journal of human genetics.

[5]  David V Conti,et al.  Testing association between disease and multiple SNPs in a candidate gene , 2007, Genetic epidemiology.

[6]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[7]  Tao Wang,et al.  Improved power by use of a weighted score test for linkage disequilibrium mapping. , 2007, American journal of human genetics.

[8]  Yun Li,et al.  CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration , 2006, Nature Genetics.

[9]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[10]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[11]  Judy H. Cho,et al.  A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene , 2006, Science.

[12]  Lachlan James M. Coin,et al.  Disease association tests by inferring ancestral haplotypes using a hidden markov model , 2008, Bioinform..

[13]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[14]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[15]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[16]  D. Conrad,et al.  A worldwide survey of haplotype variation and linkage disequilibrium in the human genome , 2006, Nature Genetics.

[17]  Daniel J Schaid,et al.  Genetic epidemiology and haplotypes , 2004, Genetic epidemiology.

[18]  Tao Jiang,et al.  Genetics and population analysis Haplotype-based linkage disequilibrium mapping via direct data mining , 2005 .

[19]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[20]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[21]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[22]  Eran Halperin,et al.  Leveraging the HapMap correlation structure in association studies. , 2007, American journal of human genetics.

[23]  Carl D Langefeld,et al.  Ordered subset analysis in genetic linkage mapping of complex traits , 2004, Genetic epidemiology.

[24]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[25]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[26]  A. Clark,et al.  The role of haplotypes in candidate gene studies , 2004, Genetic epidemiology.

[27]  Randall Pruim,et al.  Tag SNP selection for Finnish individuals based on the CEPH Utah HapMap database , 2006, Genetic epidemiology.

[28]  P. Sham,et al.  A note on the calculation of empirical P values from Monte Carlo procedures. , 2002, American journal of human genetics.

[29]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[30]  Dan L Nicolae,et al.  Testing Untyped Alleles (TUNA)—applications to genome‐wide association studies , 2006, Genetic epidemiology.