Genotype Imputation of Metabochip SNPs Using a Study-Specific Reference Panel of ~4,000 Haplotypes in African Americans From the Women’s Health Initiative

Genetic imputation has become standard practice in modern genetic studies. However, several important issues have not been adequately addressed including the utility of study-specific reference, performance in admixed populations, and quality for less common (minor allele frequency [MAF] 0.005–0.05) and rare (MAF < 0.005) variants. These issues only recently became addressable with genome-wide association studies (GWAS) follow-up studies using dense genotyping or sequencing in large samples of non-European individuals. In this work, we constructed a study-specific reference panel of 3,924 haplotypes using African Americans in the Women’s Health Initiative (WHI) genotyped on both the Metabochip and the Affymetrix 6.0 GWAS platform. We used this reference panel to impute into 6,459 WHI SNP Health Association Resource (SHARe) study subjects with only GWAS genotypes. Our analysis confirmed the imputation quality metric Rsq (estimated r 2 , specific to each SNP) as an effective post-imputation filter. We recommend different Rsq thresholds for different MAF categories such that the average (across SNPs) Rsq is above the desired dosage r 2 (squared Pearson correlation between imputed and experimental genotypes).With a desired dosage r 2 of 80%, 99.9% (97.5%, 83.6%, 52.0%, 20.5%) of SNPs with MAF > 0.05 (0.03–0.05, 0.01–0.03, 0.005–0.01, and 0.001–0.005) passed the post-imputation filter. The average dosage r 2 for these SNPs is 94.7%, 92.1%, 89.0%, 83.1%, and 79.7%, respectively. These results suggest that for African Americans imputation of Metabochip SNPs from GWAS data, including low frequency SNPs with MAF 0.005–0.05, is feasible and worthwhile for power increase in downstream association analysis provided a sizable reference panel is available.

[1]  Yun Li,et al.  Testing Genetic Association With Rare Variants in Admixed Populations , 2013, Genetic epidemiology.

[2]  Jennifer G. Robinson,et al.  Evaluation of the Metabochip Genotyping Array in African Americans and Implications for Fine Mapping of GWAS-Identified Loci: The PAGE Study , 2012, PloS one.

[3]  Yun Li,et al.  Performance of Genotype Imputation for Rare Variants Identified in Exons and Flanking Regions of Genes , 2011, PloS one.

[4]  C. Carlson,et al.  The Next PAGE in Understanding Complex Traits: Design for the Analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study , 2011, American journal of epidemiology.

[5]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[6]  Yusuke Nakamura,et al.  Genome-Wide Association Study of White Blood Cell Count in 16,388 African Americans: the Continental Origins and Genetic Epidemiology Network (COGENT) , 2011, PLoS genetics.

[7]  Gary K. Chen,et al.  Enhanced Statistical Tests for GWAS in Admixed Populations: Assessment using African Americans from CARe and a Breast Cancer Consortium , 2011, PLoS genetics.

[8]  Donald W. Bowden,et al.  Genome-Wide Association Study of Coronary Heart Disease and Its Risk Factors in 8,090 African Americans: The NHLBI CARe Project , 2011, PLoS genetics.

[9]  Kathleen F. Kerr,et al.  Genome-Wide Association Studies of the PR Interval in African Americans , 2011, PLoS genetics.

[10]  Yun Li,et al.  Genome-wide association study for adiponectin levels in Filipino women identifies CDH13 and a novel uncommon haplotype at KNG1-ADIPOQ. , 2010, Human molecular genetics.

[11]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[12]  Yun Li,et al.  To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. , 2010, American journal of human genetics.

[13]  B. Henderson,et al.  Generalizability and Epidemiologic Characterization of Eleven Colorectal Cancer GWAS Hits in Multiple Populations , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[14]  N. Bresolin,et al.  Population genetics of IFIH1: ancient population structure, local selection, and implications for susceptibility to type 1 diabetes. , 2010, Molecular biology and evolution.

[15]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[16]  P. D. de Bakker,et al.  Multiethnic Genetic Association Studies Improve Power for Locus Discovery , 2010, PloS one.

[17]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[18]  N. Burtt,et al.  Fine-Mapping in African Americans of 8 Recently Discovered Genetic Loci for Plasma Lipids: The Jackson Heart Study , 2010, Circulation. Cardiovascular genetics.

[19]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[20]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[21]  Zachary A. Szpiech,et al.  Genome-wide association studies in diverse populations , 2010, Nature Reviews Genetics.

[22]  Laura J. Bierut,et al.  A New Statistic to Evaluate Imputation Reliability , 2010, PloS one.

[23]  D. Kwiatkowski,et al.  Methodological challenges of genome-wide association analysis in Africa , 2010, Nature Reviews Genetics.

[24]  Guanjie Chen,et al.  Practical considerations for imputation of untyped markers in admixed populations , 2009, Genetic epidemiology.

[25]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[26]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[27]  B. Henderson,et al.  Generalizability of Associations from Prostate Cancer Genome-Wide Association Studies in Multiple Populations , 2009, Cancer Epidemiology Biomarkers & Prevention.

[28]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[29]  Christopher A. Haiman,et al.  Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation , 2009, Human Genetics.

[30]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[31]  R. Hegele,et al.  patient-oriented and epidemiological research Replication of genetic associations with plasma lipoprotein traits in a multiethnic sample , 2009 .

[32]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[33]  Manuel A. R. Ferreira,et al.  Practical aspects of imputation-driven meta-analysis of genome-wide association studies. , 2008, Human molecular genetics.

[34]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[35]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[36]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[37]  G. Abecasis,et al.  A note on exact tests of Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[38]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[39]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[40]  JoAnn E. Manson,et al.  Design of the Women's Health Initiative clinical trial and observational study. The Women's Health Initiative Study Group. , 1998, Controlled clinical trials.