Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos.

[1]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[2]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[3]  R. Klein,et al.  Identification of Diabetic Retinopathy Genes through a Genome-Wide Association Study among Mexican-Americans from Starr County, Texas , 2010, Journal of ophthalmology.

[4]  S. Tishkoff,et al.  Haplotype variation and genotype imputation in African populations , 2011, Genetic epidemiology.

[5]  Guanjie Chen,et al.  Practical considerations for imputation of untyped markers in admixed populations , 2009, Genetic epidemiology.

[6]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[7]  Using imputed genotypes for relative risk estimation in case-parent studies. , 2011, American journal of epidemiology.

[8]  R. Klein,et al.  The Los Angeles Latino Eye Study: design, methods, and baseline data. , 2004, Ophthalmology.

[9]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[10]  Hong-Wen Deng,et al.  Analyses and Comparison of Accuracy of Different Genotype Imputation Methods , 2008, PloS one.

[11]  Uwe Völker,et al.  New loci associated with kidney function and chronic kidney disease , 2010, Nature Genetics.

[12]  P. McKeigue,et al.  Genome-wide association study of type 2 diabetes in a sample from Mexico City and a meta-analysis of a Mexican-American sample from Starr County, Texas , 2011, Diabetologia.

[13]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[14]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[15]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[16]  Andre Franke,et al.  1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data , 2012, European Journal of Human Genetics.

[17]  R. Mei,et al.  A genomewide admixture mapping panel for Hispanic/Latino populations. , 2007, American journal of human genetics.

[18]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[19]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[20]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[21]  Yun Li,et al.  Performance of Genotype Imputation for Rare Variants Identified in Exons and Flanking Regions of Genes , 2011, PloS one.

[22]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[23]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[24]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[25]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[26]  Gonçalo Abecasis,et al.  Genotype-imputation accuracy across worldwide human populations. , 2009, American journal of human genetics.

[27]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[28]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[29]  Brooke L. Fridley,et al.  Utilizing Genotype Imputation for the Augmentation of Sequence Data , 2010, PloS one.

[30]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[31]  D. Cox,et al.  A genomewide admixture map for Latino populations. , 2007, American journal of human genetics.

[32]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[33]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[34]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[35]  C. Semple,et al.  Deep genomics in shallow times: the finished sequence of human chromosomes 13 and 19 , 2004, European Journal of Human Genetics.

[36]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[37]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[38]  Xiaoyi Gao,et al.  Genome-wide association studies: Where we are heading? , 2011 .

[39]  Claude Bouchard,et al.  Performance of Genotype Imputations Using Data from the 1000 Genomes Project , 2011, Human Heredity.

[40]  Michael Krawczak,et al.  A comprehensive evaluation of SNP genotype imputation , 2009, Human Genetics.