Increasing Power of Genome-Wide Association Studies by Collecting Additional Single-Nucleotide Polymorphisms

Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.

[1]  Eran Halperin,et al.  Tag SNP selection in genotype data for maximizing SNP prediction accuracy , 2005, ISMB.

[2]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[3]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[4]  Daniel O Stram,et al.  Tag SNP selection for association studies , 2004, Genetic epidemiology.

[5]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[6]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[7]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[8]  E. Génin,et al.  Selection of SNP subsets for association studies in candidate genes: comparison of the power of different strategies to detect single disease susceptibility locus effects , 2006, BMC Genetics.

[9]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[10]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[11]  Daniel O Stram,et al.  Software for tag single nucleotide polymorphism selection , 2005, Human Genomics.

[12]  Zhaohui S. Qin,et al.  Bioinformatics Original Paper an Efficient Comprehensive Search Algorithm for Tagsnp Selection Using Linkage Disequilibrium Criteria , 2022 .

[13]  R. Altman,et al.  Finding haplotype tagging SNPs by use of principal components analysis. , 2004, American journal of human genetics.

[14]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[15]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[16]  Nancy L Saccone,et al.  Power‐based, phase‐informed selection of single nucleotide polymorphisms for disease association screens , 2006, Genetic epidemiology.

[17]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[18]  Eleazar Eskin,et al.  Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms , 2010, Artif. Intell. Medicine.

[19]  C. Lewis,et al.  SNP Selection for Association Studies: Maximizing Power across SNP Choice and Study Size , 2005, Annals of human genetics.

[20]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[21]  E. Genin,et al.  Association Studies in Candidate Genes: Strategies to Select SNPs to Be Tested , 2004, Human Heredity.