Improving accuracy of rare variant imputation with a two-step imputation approach

Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r2 using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies.

[1]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[2]  Manuel A. R. Ferreira,et al.  Practical aspects of imputation-driven meta-analysis of genome-wide association studies. , 2008, Human molecular genetics.

[3]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[4]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[5]  Monique M. B. Breteler,et al.  The Rotterdam Study: 2016 objectives and design update , 2015, European Journal of Epidemiology.

[6]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[7]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[8]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[9]  Monya Baker,et al.  Genomics: The search for association , 2010, Nature.

[10]  M. Arfan Ikram,et al.  The Rotterdam Study: 2012 objectives and design update , 2011, European journal of epidemiology.

[11]  Emmanouil Collab A map of human genome variation from population-scale sequencing , 2011, Nature.

[12]  Claude Bouchard,et al.  Performance of Genotype Imputations Using Data from the 1000 Genomes Project , 2011, Human Heredity.

[13]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[14]  Andre Franke,et al.  1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data , 2012, European Journal of Human Genetics.

[15]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[16]  S. Chanock,et al.  A Two‐Platform Design for Next Generation Genome‐Wide Association Studies , 2012, Genetic epidemiology.