Methods for Human Demographic Inference Using Haplotype Patterns From Genomewide Single-Nucleotide Polymorphism Data

We propose a novel approximate-likelihood method to fit demographic models to human genomewide single-nucleotide polymorphism (SNP) data. We divide the genome into windows of constant genetic map width and then tabulate the number of distinct haplotypes and the frequency of the most common haplotype for each window. We summarize the data by the genomewide joint distribution of these two statistics—termed the HCN statistic. Coalescent simulations are used to generate the expected HCN statistic for different demographic parameters. The HCN statistic provides additional information for disentangling complex demography beyond statistics based on single-SNP frequencies. Application of our method to simulated data shows it can reliably infer parameters from growth and bottleneck models, even in the presence of recombination hotspots when properly modeled. We also examined how practical problems with genomewide data sets, such as errors in the genetic map, haplotype phase uncertainty, and SNP ascertainment bias, affect our method. Several modifications of our method served to make it robust to these problems. We have applied our method to data collected by Perlegen Sciences and find evidence for a severe population size reduction in northwestern Europe starting 32,500–47,500 years ago.

[1]  R. Hudson Two-locus sampling distributions and their application. , 2001, Genetics.

[2]  Ryan D. Hernandez,et al.  Genome-Wide Patterns of Nucleotide Polymorphism in Domesticated Rice , 2007, PLoS genetics.

[3]  G. Coop,et al.  Combining Sperm Typing and Linkage Disequilibrium Analyses Reveals Differences in Selective Pressures or Recombination Rates Across Human Populations , 2007, Genetics.

[4]  Ryan D. Hernandez,et al.  Context dependence, ancestral misidentification, and spurious signatures of natural selection. , 2007, Molecular biology and evolution.

[5]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[6]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[7]  R. Hudson,et al.  Maximum-Likelihood Estimation of Demographic Parameters Using the Frequency Spectrum of Unlinked Single-Nucleotide Polymorphisms , 2004, Genetics.

[8]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[9]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[10]  S. Tishkoff,et al.  African human diversity, origins and migrations. , 2006, Current opinion in genetics & development.

[11]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[12]  P. Donnelly,et al.  Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees , 2005, Science.

[13]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[14]  J. Wall,et al.  A comparison of estimators of the population recombination rate. , 2000, Molecular biology and evolution.

[15]  M. Slatkin,et al.  ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS , 2007, Evolution; international journal of organic evolution.

[16]  R. Hudson,et al.  Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Peter M Visscher,et al.  Recent human effective population size estimated from linkage disequilibrium. , 2007, Genome research.

[18]  Noah A. Rosenberg,et al.  Demographic History of European Populations of Arabidopsis thaliana , 2008, PLoS genetics.

[19]  S. Mano,et al.  Comparisons of site- and haplotype-frequency methods for detecting positive selection. , 2007, Molecular biology and evolution.

[20]  The Effects of Genotype-Dependent Recombination, and Transmission Asymmetry, on Linkage Disequilibrium , 2006, Genetics.

[21]  J. Wall,et al.  Detecting ancient admixture in humans using sequence polymorphism data. , 2000, Genetics.

[22]  John Novembre,et al.  Global distribution of genomic diversity underscores rich complex history of continental human populations. , 2009, Genome research.

[23]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[24]  F. Depaulis,et al.  Neutrality tests based on the distribution of haplotypes under an infinite-site model. , 1998, Molecular biology and evolution.

[25]  Andrew G. Clark,et al.  Reconstituting the Frequency Spectrum of Ascertained Single-Nucleotide Polymorphism Data , 2004, Genetics.

[26]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[27]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[28]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[29]  Vincent Plagnol,et al.  Possible Ancestral Structure in Human Populations , 2006, PLoS genetics.

[30]  Ryan D. Hernandez,et al.  Demographic Histories and Patterns of Linkage Disequilibrium in Chinese and Indian Rhesus Macaques , 2007, Science.

[31]  C. Fefferman,et al.  Can one learn history from the allelic spectrum? , 2008, Theoretical population biology.

[32]  E. Anderson Hudson et al. , 1977 .

[33]  Christian Gieger,et al.  Correlation between Genetic and Geographic Structure in Europe , 2008, Current Biology.

[34]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[35]  Zachary A. Szpiech,et al.  Genotype, haplotype and copy-number variation in worldwide human populations , 2008, Nature.

[36]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[37]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[38]  L. Brooks,et al.  A DNA polymorphism discovery resource for research on human genetic variation. , 1998, Genome research.

[39]  Pauline C Ng,et al.  Power to Detect Risk Alleles Using Genome-Wide Tag SNP Panels , 2007, PLoS genetics.

[40]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.

[41]  M. Slatkin,et al.  Estimating the number of founder lineages from haplotypes of closely linked SNPs , 2007, Molecular ecology.

[42]  Carlos D Bustamante,et al.  Ascertainment bias in studies of human genome-wide polymorphism. , 2005, Genome research.

[43]  Peter Donnelly,et al.  Human recombination hot spots hidden in regions of strong marker association , 2005, Nature Genetics.

[44]  Hideki Innan,et al.  Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites , 2005, Genetics.

[45]  A. Davison,et al.  Report of the Editors—2001 , 2002 .

[46]  C. Bustamante,et al.  Distinguishing Between Selective Sweeps and Demography Using DNA Polymorphism Data , 2005, Genetics.

[47]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[48]  Molly Przeworski,et al.  Fine-scale recombination patterns differ between chimpanzees and humans , 2005, Nature Genetics.

[49]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[50]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[51]  Guido Barbujani,et al.  Africans and Asians abroad: genetic diversity in Europe. , 2004, Annual review of genomics and human genetics.

[52]  Kevin R. Thornton,et al.  Approximate Bayesian Inference Reveals Evidence for a Recent, Severe Bottleneck in a Netherlands Population of Drosophila melanogaster , 2006, Genetics.

[53]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[54]  D. Conrad,et al.  A worldwide survey of haplotype variation and linkage disequilibrium in the human genome , 2006, Nature Genetics.

[55]  A. von Haeseler,et al.  Inference of population history using a likelihood approach. , 1998, Genetics.

[56]  C. Simulating Probability Distributions in the Coalescent * , 2022 .

[57]  S. Tavaré,et al.  Modern computational approaches for analysing molecular genetic variation data , 2006, Nature Reviews Genetics.

[58]  P. Donnelly,et al.  Approximate likelihood methods for estimating local recombination rates , 2002 .

[59]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[60]  W. Ewens Testing for increased mutation rate for neutral alleles. , 1973, Theoretical population biology.

[61]  R. Nielsen Estimation of population parameters and recombination rates from single nucleotide polymorphisms. , 2000, Genetics.

[62]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[63]  Gabor T. Marth,et al.  The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations , 2004, Genetics.

[64]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[65]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[66]  Garrett Hellenthal,et al.  msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots , 2007, Bioinform..

[67]  Ryan D. Hernandez,et al.  Proportionally more deleterious genetic variation in European than in African populations , 2008, Nature.

[68]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.