Accurate Imputation of Rare and Common Variants in a Founder Population From a Small Number of Sequenced Individuals

Advances in DNA sequencing technologies have greatly facilitated the discovery of rare genetic variants in the human genome, many of which may contribute to common disease risk. However, evaluating their individual or even collective effects on disease risk requires very large sample sizes, which involves study designs that are often prohibitively expensive. We present an alternative approach for determining genotypes in large numbers of individuals for all variants discovered in the sequence of relatively few individuals. Specifically, we developed a new imputation algorithm that utilizes whole‐exome sequencing data from 25 members of the South Dakota Hutterite population, and genome‐wide single nucleotide polymorphism (SNP) genotypes from >1,400 individuals from the same founder population. The algorithm relies on identity‐by‐descent sharing of phased haplotypes, a different strategy than the linkage disequilibrium methods found in most imputation algorithms. We imputed genotypes discovered in the sequence data to on average ∼77% of chromosomes among the 1,400 individuals. Median R2 between imputed and directly genotyped data was >0.99. As expected, many variants that are vanishingly rare in European populations have risen to larger frequencies in the founder population and would be amenable to single‐SNP analyses. Genet. Epidemiol. 36:312–319, 2012. © 2012 Wiley Periodicals, Inc.

[1]  G. Coop,et al.  High-Resolution Mapping of Crossovers Reveals Extensive Variation in Fine-Scale Recombination Patterns Among Humans , 2008, Science.

[2]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[3]  K. Oexle A remark on rare variants , 2010, Journal of Human Genetics.

[4]  Ying Sun,et al.  Effect of variation in CHI3L1 on serum YKL-40 level, risk of asthma, and lung function. , 2008, The New England journal of medicine.

[5]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[6]  M S McPeek,et al.  The genetic dissection of complex traits in a founder population. , 2001, American journal of human genetics.

[7]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[8]  Mark Abney,et al.  A graphical algorithm for fast computation of identity coefficients and generalized kinship coefficients , 2009, Bioinform..

[9]  Benjamin M. Neale,et al.  Genome-Wide Association Studies in an Isolated Founder Population from the Pacific Island of Kosrae , 2009, PLoS genetics.

[10]  Cox,et al.  Mapping genes for complex traits in founder populations , 1998, Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology.

[11]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[12]  A. Martin The founder effect in a human isolate: evolutionary implications. , 1970, American journal of physical anthropology.

[13]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[14]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[15]  C. Ober,et al.  A common spinal muscular atrophy deletion mutation is present on a single founder haplotype in the US Hutterites , 2011, European Journal of Human Genetics.

[16]  Lawrence H. Uricchio,et al.  Exome sequencing reveals a novel mutation for autosomal recessive non-syndromic mental retardation in the TECR gene on chromosome 19p13. , 2011, Human molecular genetics.

[17]  J. Witte,et al.  Genetic dissection of complex traits , 1996, Nature Genetics.

[18]  K. Taylor,et al.  Genome-Wide Association , 2007, Diabetes.

[19]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[20]  M. Stephens,et al.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits , 2007, PLoS genetics.

[21]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[22]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[23]  H. Kim,et al.  The genetics of asthma. , 1998, Current opinion in pulmonary medicine.

[24]  Dan M Roden,et al.  A rare variant in MYH6 is associated with high risk of sick sinus syndrome , 2011, Nature Genetics.

[25]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[26]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[27]  David N Cooper,et al.  GWAS: heritability missing in action? , 2010, European Journal of Human Genetics.

[28]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[29]  Kenneth Lange,et al.  Use of population isolates for mapping complex traits , 2000, Nature Reviews Genetics.

[30]  B. Browning,et al.  A fast, powerful method for detecting identity by descent. , 2011, American journal of human genetics.