Leveraging genetic variability across populations for the identification of causal variants.

Genome-wide association studies have been performed extensively in the last few years, resulting in many new discoveries of genomic regions that are associated with complex traits. It is often the case that a SNP found to be associated with the condition is not the causal SNP, but a proxy to it as a result of linkage disequilibrium. For the identification of the actual causal SNP, fine-mapping follow-up is performed, either with the use of dense genotyping or by sequencing of the region. In either case, if the causal SNP is in high linkage disequilibrium with other SNPs, the fine-mapping procedure will require a very large sample size for the identification of the causal SNP. Here, we show that by leveraging genetic variability across populations, we significantly increase the localization success rate (LSR) for a causal SNP in a follow-up study that involves multiple populations as compared to a study that involves only one population. Thus, the average power for detection of the causal variant will be higher in a joint analysis than that in studies in which only one population is analyzed at a time. On the basis of this observation, we developed a framework to efficiently search for a follow-up study design: our framework searches for the best combination of populations from a pool of available populations to maximize the LSR for detection of a causal variant. This framework and its accompanying software can be used to considerably enhance the power of fine-mapping studies.

[1]  John P. A. Ioannidis,et al.  Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls , 2008, Human Genetics.

[2]  Giske Ursin,et al.  FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. , 2009, Human molecular genetics.

[3]  J. Haines,et al.  Genome-wide association study identifies a novel breast cancer susceptibility locus at 6q25.1 , 2009, Nature Genetics.

[4]  H. Muller The American Journal of Human Genetics Vol . 2 No . 2 June 1950 Our Load of Mutations 1 , 2006 .

[5]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[6]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[7]  Thomas A Trikalinos,et al.  'Racial' differences in genetic effects for complex diseases , 2004, Nature Genetics.

[8]  Jiannis Ragoussis,et al.  Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants. , 2005, Genome research.

[9]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[10]  Eleazar Eskin,et al.  Linkage Effects and Analysis of Finite Sample Errors in the HapMap , 2009, Human Heredity.

[11]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[12]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[13]  P. Gregersen,et al.  Genome-wide association study provides evidence for a breast cancer risk locus at 6q22.33 , 2008, Proceedings of the National Academy of Sciences.

[14]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[15]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[16]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[17]  M. Boehnke,et al.  So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. , 2007, American journal of human genetics.

[18]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.

[19]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[20]  A. Sigurdsson,et al.  Common variants on chromosome 5p12 confer susceptibility to estrogen receptor–positive breast cancer , 2008, Nature Genetics.

[21]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[22]  W. Willett,et al.  A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1) , 2009, Nature Genetics.

[23]  Lester L. Peters,et al.  Genome-wide association study identifies novel breast cancer susceptibility loci , 2007, Nature.