Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs

BackgroundWe explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City. The following issues were evaluated: (a) the impact of different reference panels (HapMap vs. 1000 Genomes) on imputation; (b) potential differences in imputation performance between single-step vs. two-step (phasing and imputation) approaches; (c) the effect of different posterior genotype probability thresholds on imputation performance and (d) imputation performance in common vs. rare markers.MethodsThe sample from Mexico City comprised 1,310 individuals genotyped with the Affymetrix 5.0 array. We randomly masked 5% of the markers directly genotyped on chromosome 12 (n = 1,046) and compared the imputed genotypes with the microarray genotype calls. Imputation was carried out with the program IMPUTE. The concordance rates between the imputed and observed genotypes were used as a measure of imputation accuracy and the proportion of non-missing genotypes as a measure of imputation efficacy.ResultsThe single-step imputation approach produced slightly higher concordance rates than the two-step strategy (99.1% vs. 98.4% when using the HapMap phase II combined panel), but at the expense of a lower proportion of non-missing genotypes (85.5% vs. 90.1%). The 1,000 Genomes reference sample produced similar concordance rates to the HapMap phase II panel (98.4% for both datasets, using the two-step strategy). However, the 1000 Genomes reference sample increased substantially the proportion of non-missing genotypes (94.7% vs. 90.1%). Rare variants (<1%) had lower imputation accuracy and efficacy than common markers.ConclusionsThe program IMPUTE had an excellent imputation performance for common alleles in an admixed sample from Mexico City, which has primarily Native American (62%) and European (33%) contributions. Genotype concordances were higher than 98.4% using all the imputation strategies, in spite of the fact that no Native American samples are present in the HapMap and 1000 Genomes reference panels. The best balance of imputation accuracy and efficiency was obtained with the 1,000 Genomes panel. Rare variants were not captured effectively by any of the available panels, emphasizing the need to be cautious in the interpretation of association results for imputed rare variants.

[1]  P. McKeigue,et al.  Genome-wide association study of type 2 diabetes in a sample from Mexico City and a meta-analysis of a Mexican-American sample from Starr County, Texas , 2011, Diabetologia.

[2]  E. Oetjen,et al.  Genome-Wide Association Identifies Nine Common Variants Associated With Fasting Proinsulin Levels and Provides New Insights Into the Pathophysiology of Type 2 Diabetes , 2011, Diabetes.

[3]  Yongtao Guan,et al.  Practical Issues in Imputation-Based Association Mapping , 2008, PLoS genetics.

[4]  Guanjie Chen,et al.  Practical considerations for imputation of untyped markers in admixed populations , 2009, Genetic epidemiology.

[5]  Manuel A. R. Ferreira,et al.  Practical aspects of imputation-driven meta-analysis of genome-wide association studies. , 2008, Human molecular genetics.

[6]  K. Frazer,et al.  Human genetic variation and its contribution to complex traits , 2009, Nature Reviews Genetics.

[7]  Eran Halperin,et al.  A generic coalescent‐based framework for the selection of a reference panel for imputation , 2010, Genetic epidemiology.

[8]  Jean-Baptiste Cazier,et al.  Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33 , 2010, Nature Genetics.

[9]  Ludwig Kappos,et al.  Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci , 2009, Nature Genetics.

[10]  Peter Donnelly,et al.  Progress and challenges in genome-wide association studies in humans , 2008, Nature.

[11]  Christopher R. Gignoux,et al.  Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas , 2012, PLoS genetics.

[12]  M. Daly,et al.  Transferability of tag SNPs in genetic association studies in multiple populations , 2006, Nature Genetics.

[13]  A. Morris,et al.  Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. , 2008, American journal of human genetics.

[14]  A. Sanchez‐Mazas,et al.  HLA DNA Sequence Variation among Human Populations: Molecular Signatures of Demographic and Selective Events , 2011, PloS one.

[15]  Sharon R. Browning,et al.  Missing data imputation and haplotype phase inference for genome-wide association studies , 2008, Human Genetics.

[16]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[17]  Xiaofeng Zhu,et al.  Genome-wide comparison of African-ancestry populations from CARe and other cohorts reveals signals of natural selection. , 2011, American journal of human genetics.

[18]  Christopher A. Haiman,et al.  Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation , 2009, Human Genetics.

[19]  Inês Barroso,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[20]  Donald W. Bowden,et al.  Genome-Wide Association Study of Coronary Heart Disease and Its Risk Factors in 8,090 African Americans: The NHLBI CARe Project , 2011, PLoS genetics.

[21]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[22]  Eric E Schadt,et al.  Accuracy of Genome-wide Imputation of Untyped Markers and Impacts on Statistical Power for Association Studies , 2009 .

[23]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[24]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[25]  Tariq Ahmad,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[26]  Lon R Cardon,et al.  Evaluating coverage of genome-wide association studies , 2006, Nature Genetics.

[27]  Jianjun Liu,et al.  High-throughput genomic technology in research and clinical management of breast cancer. Evolving landscape of genetic epidemiological studies , 2006, Breast Cancer Research.

[28]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[29]  Mohamad Saad,et al.  Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies , 2011, The Lancet.

[30]  Hong-Wen Deng,et al.  Analyses and Comparison of Imputation-Based Association Methods , 2010, PloS one.

[31]  G. V. Ommen,et al.  Medical genomics , 2001, European Journal of Human Genetics.

[32]  Anders Albrechtsen,et al.  Natural Selection and the Distribution of Identity-by-Descent in the Human Genome , 2010, Genetics.

[33]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[34]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[35]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[36]  N. Freimer,et al.  Geographic Patterns of Genome Admixture in Latin American Mestizos , 2008, PLoS genetics.

[37]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[38]  D. Allison,et al.  Estimating African American admixture proportions by use of population-specific alleles. , 1998, American journal of human genetics.

[39]  Michael Krawczak,et al.  A comprehensive evaluation of SNP genotype imputation , 2009, Human Genetics.

[40]  Hong-Wen Deng,et al.  Analyses and Comparison of Accuracy of Different Genotype Imputation Methods , 2008, PloS one.

[41]  Taesung Park,et al.  Large-scale genome-wide association studies in east Asians identify new genetic loci influencing metabolic traits , 2011, Nature Genetics.

[42]  Ku Chee Seng,et al.  High‐Throughput Single Nucleotide Polymorphisms Genotyping Technologies , 2009 .

[43]  Gary K. Chen,et al.  Enhanced Statistical Tests for GWAS in Admixed Populations: Assessment using African Americans from CARe and a Breast Cancer Consortium , 2011, PLoS genetics.

[44]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[45]  Leonid Kruglyak,et al.  The road to genome-wide association studies , 2008, Nature Reviews Genetics.

[46]  Paola Sebastiani,et al.  Genome‐wide association studies and the genetic dissection of complex traits , 2009, American journal of hematology.

[47]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[48]  Gonçalo Abecasis,et al.  Genotype-imputation accuracy across worldwide human populations. , 2009, American journal of human genetics.

[49]  J. Krieger,et al.  An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations , 2011, BMC Genetics.

[50]  Peter Delves,et al.  Encyclopedia of life sciences , 2009 .

[51]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[52]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[53]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.

[54]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[55]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[56]  L. Kruglyak Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.

[57]  Vincent Plagnol,et al.  Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci , 2008, Nature Genetics.

[58]  M. Daly,et al.  Evaluating and improving power in whole-genome association studies using fixed marker sets , 2006, Nature Genetics.

[59]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[60]  D. Kell BMC Medical Genomics , 2008 .

[61]  N C Dracopoli,et al.  Progress in high throughput SNP genotyping methods , 2002, The Pharmacogenomics Journal.

[62]  Tien Yin Wong,et al.  Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians , 2011, Nature Genetics.