FAPI: Fast and accurate P-value Imputation for genome-wide association study

Imputing individual-level genotypes (or genotype imputation) is now a standard procedure in genome-wide association studies (GWAS) to examine disease associations at untyped common genetic variants. Meta-analysis of publicly available GWAS summary statistics can allow more disease-associated loci to be discovered, but these data are usually provided for various variant sets. Thus imputing these summary statistics of different variant sets into a common reference panel for meta-analyses is impossible using traditional genotype imputation methods. Here we develop a fast and accurate P-value imputation (FAPI) method that utilizes summary statistics of common variants only. Its computational cost is linear with the number of untyped variants and has similar accuracy compared with IMPUTE2 with prephasing, one of the leading methods in genotype imputation. In addition, based on the FAPI idea, we develop a metric to detect abnormal association at a variant and showed that it had a significantly greater power compared with LD-PAC, a method that quantifies the evidence of spurious associations based on likelihood ratio. Our method is implemented in a user-friendly software tool, which is available at http://statgenpro.psychiatry.hku.hk/fapi.

[1]  E. Eskin,et al.  Efficient Association Study Design Via Power‐Optimized Tag SNP Selection , 2008, Annals of human genetics.

[2]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[3]  Sharon R Browning,et al.  Multilocus association mapping using variable-length Markov chains. , 2006, American journal of human genetics.

[4]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[5]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[6]  Jason H. Moore,et al.  Chapter 11: Genome-Wide Association Studies , 2012, PLoS Comput. Biol..

[7]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[8]  Donghyung Lee,et al.  DIST: direct imputation of summary statistics for unmeasured SNPs , 2013, Bioinform..

[9]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[10]  Eleazar Eskin,et al.  Postassociation cleaning using linkage disequilibrium information , 2011, Genetic epidemiology.

[11]  Michael Krawczak,et al.  A comprehensive evaluation of SNP genotype imputation , 2009, Human Genetics.

[12]  Scott M. Williams,et al.  Guidelines for Genome-Wide Association Studies , 2012, PLoS genetics.

[13]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[14]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[15]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[16]  Eric Y H Chen,et al.  Common variants on Xq28 conferring risk of schizophrenia in Han Chinese. , 2014, Schizophrenia bulletin.

[17]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[18]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[19]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.