Genotype Imputation with Thousands of Genomes

Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package.

[1]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[2]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[3]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[4]  Ron Shamir,et al.  GERBIL: Genotype resolution and block identification using likelihood. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  S. Tishkoff,et al.  African human diversity, origins and migrations. , 2006, Current opinion in genetics & development.

[6]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[7]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[8]  M. Stephens,et al.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits , 2007, PLoS genetics.

[9]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[10]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[11]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[12]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[13]  Manuel A. R. Ferreira,et al.  Practical aspects of imputation-driven meta-analysis of genome-wide association studies. , 2008, Human molecular genetics.

[14]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[15]  Sharon R. Browning,et al.  Missing data imputation and haplotype phase inference for genome-wide association studies , 2008, Human Genetics.

[16]  Ion I. Mandoiu,et al.  Genotype Error Detection Using Hidden Markov Models of Haplotype Diversity , 2007, WABI.

[17]  Yongtao Guan,et al.  Practical Issues in Imputation-Based Association Mapping , 2008, PLoS genetics.

[18]  S. Tishkoff,et al.  African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. , 2008, Annual review of genomics and human genetics.

[19]  Taane G. Clark,et al.  A global network for investigating the genomic epidemiology of malaria , 2008, Nature.

[20]  Ryan D. Hernandez,et al.  A flexible forward simulator for populations subject to selection and demography , 2008, Bioinform..

[21]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[22]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[23]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[24]  Peter Donnelly,et al.  Genome-wide and fine-resolution association analysis of malaria in West Africa , 2009, Nature Genetics.

[25]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[26]  Chaolong Wang,et al.  The relationship between imputation error and statistical power in genetic association studies in diverse populations. , 2009, American journal of human genetics.

[27]  Stephen L. Hauser,et al.  Genome-wide patterns of population structure and admixture in West Africans and African Americans , 2009, Proceedings of the National Academy of Sciences.

[28]  Christopher A. Haiman,et al.  Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation , 2009, Human Genetics.

[29]  Gonçalo Abecasis,et al.  Genotype-imputation accuracy across worldwide human populations. , 2009, American journal of human genetics.

[30]  Scott M. Williams,et al.  The Genetic Structure and History of Africans and African Americans , 2009, Science.

[31]  S. Tishkoff,et al.  The Evolution of Human Genetic and Phenotypic Variation in Africa , 2010, Current Biology.

[32]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[33]  Zachary A. Szpiech,et al.  Genome-wide association studies in diverse populations , 2010, Nature Reviews Genetics.

[34]  Inês Barroso,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[35]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[36]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[37]  Chaolong Wang,et al.  Inference of unexpected genetic relatedness among individuals in HapMap Phase III. , 2010, American journal of human genetics.

[38]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[39]  Eran Halperin,et al.  A generic coalescent‐based framework for the selection of a reference panel for imputation , 2010, Genetic epidemiology.

[40]  Emmanouil Collab A map of human genome variation from population-scale sequencing , 2011, Nature.

[41]  G. Abecasis,et al.  Low-coverage sequencing: implications for design of complex trait association studies. , 2011, Genome research.

[42]  B. Stranger,et al.  Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics , 2011, Genetics.

[43]  Luke Jostins,et al.  Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets , 2011, European Journal of Human Genetics.

[44]  Paul Scheet,et al.  A comparison of approaches to account for uncertainty in analysis of imputed genotypes , 2011, Genetic epidemiology.