A combined reference panel from the 1000 Genomes and UK10K projects improved rare variant imputation in European and Chinese samples

Imputation using the 1000 Genomes haplotype reference panel has been widely adapted to estimate genotypes in genome wide association studies. To evaluate imputation quality with a relatively larger reference panel and a reference panel composed of different ethnic populations, we conducted imputations in the Framingham Heart Study and the North Chinese Study using a combined reference panel from the 1000 Genomes (N = 1,092) and UK10K (N = 3,781) projects. For rare variants with 0.01% < MAF ≤ 0.5%, imputation in the Framingham Heart Study with the combined reference panel increased well-imputed genotypes (with imputation quality score ≥0.4) from 62.9% to 76.1% when compared to imputation with the 1000 Genomes. For the North Chinese samples, imputation of rare variants with 0.01% < MAF ≤ 0.5% with the combined reference panel increased well-imputed genotypes by from 49.8% to 61.8%. The predominant European ancestry of the UK10K and the combined reference panels may explain why there was less of an increase in imputation success in the North Chinese samples. Our results underscore the importance and potential of larger reference panels to impute rare variants, while recognizing that increasing ethnic specific variants in reference panels may result in better imputation for genotypes in some ethnic groups.

[1]  Michael Boehnke,et al.  Software for determining most likely relationships in relative pairs , 1997 .

[2]  E. Mignot,et al.  Genome Wide Analysis of Narcolepsy in China Implicates Novel Immune Loci and Reveals Changes in Association Prior to Versus After the 2009 H1N1 Influenza Pandemic , 2013, PLoS genetics.

[3]  Kathryn Roeder,et al.  Next generation analytic tools for large scale genetic epidemiology studies of complex diseases , 2012, Genetic epidemiology.

[4]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[5]  N. Tommerup,et al.  The myosin chaperone UNC45B is involved in lens development and autosomal dominant juvenile cataract , 2014, European Journal of Human Genetics.

[6]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[7]  Yun Li,et al.  Performance of Genotype Imputation for Rare Variants Identified in Exons and Flanking Regions of Genes , 2011, PloS one.

[8]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[9]  M. Daly,et al.  Evaluating and improving power in whole-genome association studies using fixed marker sets , 2006, Nature Genetics.

[10]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[11]  Carlo Sidore,et al.  Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs , 2014, European Journal of Human Genetics.

[12]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[13]  J. Keene,et al.  Embryonic lethal abnormal visual RNA-binding proteins involved in growth, differentiation, and posttranscriptional gene expression. , 1997, American journal of human genetics.

[14]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[15]  P. Elliott,et al.  The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. , 2008, International journal of epidemiology.

[16]  Claude Bouchard,et al.  Performance of Genotype Imputations Using Data from the 1000 Genomes Project , 2011, Human Heredity.

[17]  Tom R. Gaunt,et al.  Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel , 2015, Nature Communications.

[18]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[19]  Gonçalo Abecasis,et al.  Genotype-imputation accuracy across worldwide human populations. , 2009, American journal of human genetics.

[20]  Celia M T Greenwood,et al.  Effect of genome-wide genotyping and reference panels on rare variants imputation. , 2012, Journal of genetics and genomics = Yi chuan xue bao.

[21]  R. Mägi,et al.  Genome-Wide Association Analysis of Imputed Rare Variants: Application to Seven Common Complex Diseases , 2012, Genetic epidemiology.

[22]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[23]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[24]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[25]  Beth Wilmot,et al.  Edinburgh Explorer Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture , 2022 .

[26]  E. Zeggini Next-generation association studies for complex traits , 2011, Nature Genetics.

[27]  L. Liang,et al.  Extremely low-coverage sequencing and imputation increases power for genome-wide association studies , 2012, Nature Genetics.

[28]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[29]  J R O'Connell,et al.  PedCheck: a program for identification of genotype incompatibilities in linkage analysis. , 1998, American journal of human genetics.

[30]  Pavlos Pavlidis,et al.  1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans , 2013, Nucleic Acids Res..

[31]  Heorhiy Byelas,et al.  Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands' , 2014, European Journal of Human Genetics.

[32]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.