论文信息 - Evaluation and application of summary statistic imputation to discover new height-associated loci

Evaluation and application of summary statistic imputation to discover new height-associated loci

Abstract As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, while genotype imputation boasts a 2- to 5-fold lower root-mean-square error, summary statistics imputation better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded an increase in statistical power by 15, 10 and 3%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression. Author summary Genome-wide association studies (GWASs) quantify the effect of genetic variants and traits, such as height. Such estimates are called association summary statistics and are typically publicly shared through publication. Typically, GWASs are carried out by genotyping ~ 500′000 SNVs for each individual which are then combined with sequenced reference panels to infer untyped SNVs in each’ individuals genome. This process of genotype imputation is resource intensive and can therefore be a limitation when combining many GWASs. An alternative approach is to bypass the use of individual data and directly impute summary statistics. In our work we compare the performance of summary statistics imputation to genotype imputation. Although we observe a 2- to 5-fold lower RMSE for genotype imputation compared to summary statistics imputation, summary statistics imputation better distinguishes true associations from null results. Furthermore, we demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank. Our study demonstrates that given current reference panels, summary statistics imputation is a very efficient and cost-effective way to identify common or low-frequency trait-associated loci.

Aaron F. McDaid | Z. Kutalik | S. Rüeger

[1] Aaron F. McDaid,et al. Improved imputation of summary statistics for admixed populations , 2018, bioRxiv.

[2] P. Visscher,et al. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data , 2017, Genome Biology.

[3] Marcelo P. Segura-Lepe,et al. Rare and low-frequency coding variants alter human adult height , 2016, Nature.

[4] A. Price,et al. Dissecting the genetics of complex traits using summary association statistics , 2016, Nature Reviews Genetics.

[5] James R. Staley,et al. PhenoScanner: a database of human genotype–phenotype associations , 2016, Bioinform..

[6] M. Pirinen,et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA , 2016, Nature Communications.

[7] Judy H. Cho,et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations , 2015, Nature Genetics.

[8] Jun S. Liu,et al. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[9] P. Elliott,et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[10] Gonçalo R. Abecasis,et al. Minimac2: Faster Genotype Imputation , 2015, Bioinform..

[11] C. Morrison,et al. Hormonal Contraception and the Risk of HIV Acquisition: An Individual Participant Data Meta-analysis , 2015, PLoS medicine.