A Hidden Markov Model for Investigating Recent Positive Selection through Haplotype Structure

Recent positive selection can increase the frequency of an advantageous mutant rapidly enough that a relatively long ancestral haplotype will be remained intact around it. We present a hidden Markov model (HMM) to identify such haplotype structures. With HMM identified haplotype structures, a population genetic model for the extent of ancestral haplotypes is then adopted for parameter inference of the selection intensity and the allele age. Simulations show that this method can detect selection under a wide range of conditions and has higher power than the existing frequency spectrum-based method. In addition, it provides good estimate of the selection coefficients and allele ages for strong selection. The method analyzes large data sets in a reasonable amount of running time. This method is applied to HapMap III data for a genome scan, and identifies a list of candidate regions putatively under recent positive selection. It is also applied to several genes known to be under recent positive selection, including the LCT, KITLG and TYRP1 genes in Northern Europeans, and OCA2 in East Asians, to estimate their allele ages and selection coefficients.

[1]  M. Slatkin,et al.  Estimating the age of alleles by use of intraallelic variability. , 1997, American journal of human genetics.

[2]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[3]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[4]  Richard Durrett,et al.  Approximating selective sweeps. , 2004, Theoretical population biology.

[5]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[6]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[7]  M Slatkin,et al.  Simulating genealogies of selected alleles in a population of variable size. , 2001, Genetical research.

[8]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[9]  Mark Stoneking,et al.  Positive selection in East Asians for an EDAR allele that enhances NF-kappaB activation. , 2008, PloS one.

[10]  F J Ayala,et al.  Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. , 1994, Genetics.

[11]  R. Griffiths,et al.  The frequency spectrum of a mutation, and its age, in a general diffusion model. , 2003, Theoretical population biology.

[12]  M. Slatkin,et al.  Inferring Selection Intensity and Allele Age from Multilocus Haplotype Structure , 2013, G3: Genes, Genomes, Genetics.

[13]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[14]  W Stephan,et al.  The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. , 1995, Genetics.

[15]  Joseph K. Pickrell,et al.  Signals of recent positive selection in a worldwide sample of human populations. , 2009, Genome research.

[16]  Hua Chen The joint allele frequency spectrum of multiple populations: a coalescent theory approach. , 2012, Theoretical population biology.

[17]  David Reich,et al.  Population differentiation as a test for selective sweeps. , 2010, Genome research.

[18]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[19]  M. Shriver,et al.  Interrogating a high-density SNP map for signatures of natural selection. , 2002, Genome research.

[20]  E. Parra,et al.  Association of the OCA2 Polymorphism His615Arg with Melanin Content in East Asian Populations: Further Evidence of Convergent Evolution of Skin Pigmentation , 2010, PLoS genetics.

[21]  M. Slatkin,et al.  Estimating allele age. , 2003, Annual review of genomics and human genetics.

[22]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[23]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[24]  N. Jablonski,et al.  The evolution of human skin coloration. , 2000, Journal of human evolution.

[25]  Li Jin,et al.  Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant , 2013, Cell.

[26]  A. Di Rienzo,et al.  Complex signatures of natural selection at the Duffy blood group locus. , 2002, American journal of human genetics.

[27]  Pardis C Sabeti,et al.  Genetic signatures of strong recent positive selection at the lactase gene. , 2004, American journal of human genetics.

[28]  Kevin R. Thornton,et al.  A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome , 2007, PLoS biology.

[29]  D. Pei,et al.  Purification and characterization of enzymes involved in the degradation of chemotactic N-formyl peptides. , 2005, Biochemistry.

[30]  P. Visscher,et al.  Geographical structure and differential natural selection among North European populations. , 2009, Genome research.

[31]  Hua Chen,et al.  Asymptotic Distributions of Coalescence Times and Ancestral Lineage Numbers for Populations with Temporally Varying Size , 2013, Genetics.

[32]  J. Hermisson,et al.  Soft Sweeps , 2005, Genetics.

[33]  C. Spencer,et al.  Screening for recently selected alleles by analysis of human haplotype similarity. , 2006, American journal of human genetics.

[34]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[35]  W. Li,et al.  Statistical tests of neutrality of mutations. , 1993, Genetics.

[36]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[37]  D. Swallow Genetics of lactase persistence and lactose intolerance. , 2003, Annual review of genetics.

[38]  Robert C. Griffiths,et al.  Asymptotic line-of-descent distributions , 1984 .

[39]  Sergio Lukić,et al.  Demographic Inference Using Spectral Methods on SNP Data, with an Analysis of the Human Out-of-Africa Expansion , 2012, Genetics.

[40]  Philip Rosenstiel,et al.  Genome-wide association study for Crohn's disease in the Quebec Founder Population identifies multiple validated disease loci , 2007, Proceedings of the National Academy of Sciences.

[41]  M Slatkin,et al.  An exact test for neutrality based on the Ewens sampling distribution. , 1994, Genetical research.

[42]  Mary Sara McPeek,et al.  Parametric Bootstrap for Assessment of Goodness of Fit of Models for Block Haplotype Structure , 2002, Computational Methods for SNPs and Haplotype Inference.

[43]  N L Kaplan,et al.  The coalescent process in models with selection. , 1988, Genetics.

[44]  W. Stephan,et al.  Detecting a local signature of genetic hitchhiking along a recombining chromosome. , 2002, Genetics.

[45]  Mark D Shriver,et al.  The timing of pigmentation lightening in Europeans. , 2013, Molecular biology and evolution.

[46]  Montgomery Slatkin,et al.  Modern developments in theoretical population genetics : the legacy of Gustave Malécot , 2002 .

[47]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[48]  Graham Coop,et al.  Ancestral inference on gene trees under selection. , 2004, Theoretical population biology.

[49]  Hideki Innan,et al.  Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites , 2005, Genetics.

[50]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[51]  Yiping Shen,et al.  A genome-wide search for signals of high-altitude adaptation in Tibetans. , 2011, Molecular biology and evolution.

[52]  N L Kaplan,et al.  The coalescent process in models with selection and recombination. , 1988, Genetics.

[53]  Zhaohui S. Qin,et al.  Genome-wide detection and characterization of positive selection in human populations , 2007 .

[54]  Eric S. Lander,et al.  Identifying Recent Adaptations in Large-Scale Genomic Data , 2013, Cell.

[55]  T. Ohta,et al.  The effect of selected linked locus on heterozygosity of neutral alleles (the hitch-hiking effect). , 1975, Genetical research.

[56]  Bruce Rannala,et al.  Joint Bayesian Estimation of Mutation Location and Age Using Linkage Disequilibrium , 2002, Pacific Symposium on Biocomputing.

[57]  Carlos D Bustamante,et al.  Localizing Recent Adaptive Evolution in the Human Genome , 2007, PLoS genetics.

[58]  Jinchuan Xing,et al.  Genetic Evidence for High-Altitude Adaptation in Tibet , 2010, Science.

[59]  A. Navarro,et al.  Signatures of Positive Selection in Genes Associated with Human Skin Pigmentation as Revealed from Analyses of Single Nucleotide Polymorphisms , 2007, Annals of human genetics.

[60]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[61]  M. Slatkin Allele age and a test for selection on rare alleles. , 2000, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[62]  N. Risch,et al.  Reconstructing genetic ancestry blocks in admixed individuals. , 2006, American journal of human genetics.

[63]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[64]  Gregory Ewing,et al.  MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus , 2010, Bioinform..

[65]  F. Depaulis,et al.  Neutrality tests based on the distribution of haplotypes under an infinite-site model. , 1998, Molecular biology and evolution.

[66]  Hui Zhang,et al.  Genetic variations in Tibetan populations and high-altitude adaptation at the Himalayas. , 2011, Molecular biology and evolution.

[67]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[68]  Thomas Wiehe,et al.  The Effect of Strongly Selected Substitutions on Neutral Polymorphism: Analytical Results Based on Diffusion Theory , 1992 .

[69]  Yi Peng,et al.  Identification of a Tibetan-specific mutation in the hypoxic gene EGLN1 and its contribution to high-altitude adaptation. , 2013, Molecular biology and evolution.

[70]  C. Simulating Probability Distributions in the Coalescent * , 2022 .

[71]  M. Slatkin A Bayesian method for jointly estimating allele age and selection intensity. , 2008, Genetics research.

[72]  Wei Wang,et al.  Natural selection on EPAS1 (HIF2α) associated with low hemoglobin concentration in Tibetan highlanders , 2010, Proceedings of the National Academy of Sciences.