Minimac2: Faster Genotype Imputation

UNLABELLED Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques can help to keep imputation broadly accessible. Overall, these improvements speed up imputation by an order of magnitude compared with our previous implementation. AVAILABILITY AND IMPLEMENTATION minimac2, including source code, documentation, and examples is available at http://genome.sph.umich.edu/wiki/Minimac2

[1]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[2]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[3]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[4]  Vincent Plagnol,et al.  Possible Ancestral Structure in Human Populations , 2006, PLoS genetics.

[5]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[6]  A A Schäffer,et al.  Faster sequential genetic linkage computations. , 1993, American journal of human genetics.

[7]  Pieter B. T. Neerincx,et al.  Supplementary Information Whole-genome sequence variation , population structure and demographic history of the Dutch population , 2022 .

[8]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[9]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[10]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[11]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[12]  Andrew D. Johnson,et al.  Whole Genome Sequence-Based Analysis of a Model Complex Trait, High Density Lipoprotein Cholesterol , 2013, Nature Genetics.

[13]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[14]  Daniel F. Gudbjartsson,et al.  Allegro, a new computer program for multipoint linkage analysis , 2000, Nature genetics.