Kinpute: using identity by descent to improve genotype imputation

MOTIVATION Genotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are notwell represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information-due to either recent, familial relatedness or distant, unknown ancestors-in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality. RESULTS Given initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty. AVAILABILITY Kinpute is an open-source and freely available C++ software package that can be downloaded from. SUPPLEMENTARY INFORMATION Supplementary information is available at Bioinformatics online.

[1]  E A Thompson,et al.  Gene identities and multiple relationships. , 1974, Biometrics.

[2]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[3]  R. Mägi,et al.  Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel , 2017, European Journal of Human Genetics.

[4]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[5]  J. Roach,et al.  A population-specific reference panel empowers genetic studies of Anabaptist populations , 2017, Scientific Reports.

[6]  Mark Abney,et al.  Using identity by descent estimation with dense genotype data to detect positive selection , 2012, European Journal of Human Genetics.

[7]  Identity-by-descent graphs offer a flexible framework for imputation and both linkage and association analyses , 2014, BMC Proceedings.

[8]  Mark Abney,et al.  Identity by descent estimation with dense genome‐wide genotype data , 2011, Genetic epidemiology.

[9]  Jessica X Chong,et al.  Accurate Imputation of Rare and Common Variants in a Founder Population From a Small Number of Sequenced Individuals , 2012, Genetic epidemiology.

[10]  Ross M. Fraser,et al.  A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness , 2014, PLoS genetics.

[11]  Gonçalo R. Abecasis,et al.  Minimac2: Faster Genotype Imputation , 2015, Bioinform..

[12]  Alan M. Kwong,et al.  Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers , 2015, Nature Genetics.

[13]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[14]  Brian L Browning,et al.  A One-Penny Imputed Genome from Next-Generation Reference Panels. , 2018, American journal of human genetics.

[15]  E. Wijsman,et al.  GIGI: an approach to effective imputation of dense genotypes on large pedigrees. , 2013, American journal of human genetics.

[16]  Ellen M Wijsman,et al.  Combining Family‐ and Population‐Based Imputation Data for Association Analysis of Rare and Common Variants in Large Pedigrees , 2014, Genetic epidemiology.

[17]  R. Durbin,et al.  Identity-by-Descent-Based Phasing and Imputation in Founder Populations Using Graphical Models , 2011, Genetic epidemiology.

[18]  Alan M. Kwong,et al.  A reference panel of 64,976 haplotypes for genotype imputation , 2015, Nature Genetics.

[19]  Heorhiy Byelas,et al.  Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands' , 2014, European Journal of Human Genetics.

[20]  Joshua T. Burdick,et al.  In silico method for inferring genotypes in pedigrees , 2006, Nature Genetics.

[21]  M. Lyons,et al.  Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population , 2012, Genetics.

[22]  Daniel J Schaid,et al.  PedBLIMP: Extending Linear Predictors to Impute Genotypes in Pedigrees , 2014, Genetic epidemiology.

[23]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[24]  E. Wijsman,et al.  Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees , 2018, Genome research.

[25]  Carlo Sidore,et al.  Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs , 2014, European Journal of Human Genetics.

[26]  G. Abecasis,et al.  Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels , 2017, Genetic epidemiology.

[27]  Céline Bellenguez,et al.  Strategies for phasing and imputation in a population isolate , 2018, Genetic epidemiology.

[28]  Oren E. Livne,et al.  PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population , 2015, PLoS Comput. Biol..

[29]  Brian L Browning,et al.  Genotype Imputation with Millions of Reference Samples. , 2016, American journal of human genetics.