Benefits of Accurate Imputations in GWAS

Imputation methods have been suggested as an efficient way to increase both utility and coverage in genome-wide association studies, especially when combining data generated from different genotyping arrays. We aim to demonstrate that imputation results are extremely accurate and the association analysis from imputed data does not over-inflate the results. Instead imputation leads to an increase in the power of the dataset without introducing any systematic biases. The majority of common variants can be imputed with very high accuracy (r2>0.9) and we validated the accuracy of imputations by comparing actual genotypes from low-throughput genotyping assays against imputed genotypes. Imputation was performed using IMPUTE2 and the 1000 Genomes cosmopolitan reference panel, which results in about 38 million SNPs. After quality control and filtering we performed case-control associations with 3,159,556 markers. We show a comparison of results from genotyped and imputed data and also determine how accurate ancestry is determined by imputations.

[1]  Suzette J. Bielinski,et al.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study , 2012, J. Am. Medical Informatics Assoc..

[2]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[3]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[4]  M. Stephens,et al.  Interpreting principal component analyses of spatial population genetic variation , 2008, Nature Genetics.

[5]  J. Ioannidis,et al.  Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.

[6]  B. Stranger,et al.  Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics , 2011, Genetics.

[7]  C. McCarty,et al.  Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. , 2005, Personalized medicine.

[8]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[9]  Rongling Li,et al.  Quality Control Procedures for Genome‐Wide Association Studies , 2011, Current protocols in human genetics.

[10]  C. McCarty,et al.  Development of a fingerprinting panel using medically relevant polymorphisms , 2009, BMC Medical Genomics.

[11]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[12]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[13]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[14]  David Levine,et al.  A high-performance computing toolset for relatedness and principal component analysis of SNP data , 2012, Bioinform..

[15]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[16]  Catherine A McCarty,et al.  Population based allele frequencies of disease associated polymorphisms in the Personalized Medicine Research Project , 2010, BMC Genetics.

[17]  Marylyn D. Ritchie,et al.  Visually integrating and exploring high throughput Phenome-Wide Association Study (PheWAS) results using PheWAS-View , 2012, BioData Mining.

[18]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[19]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[20]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.