Inference of population history using coalescent HMMs: review and outlook.

Studying how diverse human populations are related is of historical and anthropological interest, in addition to providing a realistic null model for testing for signatures of natural selection or disease associations. Furthermore, understanding the demographic histories of other species is playing an increasingly important role in conservation genetics. A number of statistical methods have been developed to infer population demographic histories using whole-genome sequence data, with recent advances focusing on allowing for more flexible modeling choices, scaling to larger data sets, and increasing statistical power. Here we review coalescent hidden Markov models, a powerful class of population genetic inference methods that can utilize linkage disequilibrium information effectively. We highlight recent advances, give advice for practitioners, point out potential pitfalls, and present possible future research directions.

[1]  D. Cutler,et al.  Population demographic history can cause the appearance of recombination hotspots. , 2012, American journal of human genetics.

[2]  Introgression makes waves in inferred histories of effective population size , 2017 .

[3]  Asger Hobolth,et al.  Markovian approximation to the finite loci coalescent with recombination along multiple sequences. , 2014, Theoretical population biology.

[4]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[5]  G. Coop,et al.  An approximate likelihood for genetic data under a model with recombination and population splitting. , 2009, Theoretical population biology.

[6]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  Annabel C Beichman,et al.  Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories , 2017, G3: Genes, Genomes, Genetics.

[8]  R. Durbin,et al.  Inferring human population size and separation history from multiple genome sequences , 2014, Nature Genetics.

[9]  A. Hobolth,et al.  Non-parametric estimation of population size changes from the site frequency spectrum , 2017, bioRxiv.

[10]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[11]  Brian L Browning,et al.  Detecting identity by descent and estimating genotype error rates in sequence data. , 2013, American journal of human genetics.

[12]  Yun S. Song,et al.  Blockwise HMM computation for large-scale population genomic inference , 2012, Bioinform..

[13]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[14]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[15]  Yun S. Song,et al.  Inference of complex population histories using whole-genome sequences from multiple populations , 2015, Proceedings of the National Academy of Sciences.

[16]  Hannes P. Eggertsson,et al.  Parental influence on human germline de novo mutations in 1,548 trios from Iceland , 2017, Nature.

[17]  Yun S. Song,et al.  High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability , 2018, bioRxiv.

[18]  Mattias Jakobsson,et al.  Genomic evidence for the Pleistocene and recent population history of Native Americans , 2015, Science.

[19]  C. Fefferman,et al.  Can one learn history from the allelic spectrum? , 2008, Theoretical population biology.

[20]  Elchanan Mossel,et al.  Can one hear the shape of a population history? , 2014, Theoretical population biology.

[21]  A. Hobolth,et al.  Ancestral Population Genomics: The Coalescent Hidden Markov Model Approach , 2009, Genetics.

[22]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[23]  Yun S. Song,et al.  Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data , 2014, bioRxiv.

[24]  R. Nielsen,et al.  Inferring Demographic History from a Spectrum of Shared Haplotype Lengths , 2013, PLoS genetics.

[25]  Yun S. Song,et al.  Estimating Variable Effective Population Sizes from Multiple Genomes: A Sequentially Markov Conditional Sampling Distribution Approach , 2013, Genetics.

[26]  Bonnie Berger,et al.  Genetic evidence for recent population mixture in India. , 2013, American journal of human genetics.

[27]  Effects of linked selective sweeps on demographic inference and model selection , 2016 .

[28]  D. Reich,et al.  Genome-wide patterns of selection in 230 ancient Eurasians , 2015, Nature.

[29]  Luigi Luca Cavalli-sfroza The History and Geography of Human Genes , 1994 .

[30]  J. Hein,et al.  Recombination as a point process along sequences. , 1999, Theoretical population biology.

[31]  Søren Brunak,et al.  A genomic history of Aboriginal Australia , 2016, Nature.

[32]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[33]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[34]  A. Eriksson,et al.  Reconstructing the origin and spread of horse domestication in the Eurasian steppe , 2012, Proceedings of the National Academy of Sciences.

[35]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[36]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[37]  I. Pe’er,et al.  Length distributions of identity by descent reveal fine-scale demographic history. , 2012, American journal of human genetics.

[38]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[39]  M. Groenen,et al.  Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes , 2015, Nature Genetics.

[40]  Asger Hobolth,et al.  The SMC′ Is a Highly Accurate Approximation to the Ancestral Recombination Graph , 2015, Genetics.

[41]  Jun Fan,et al.  Genomic Analysis of Demographic History and Ecological Niche Modeling in the Endangered Sumatran Rhinoceros Dicerorhinus sumatrensis , 2018, Current Biology.

[42]  Anand Bhaskar,et al.  DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA. , 2013, Annals of statistics.

[43]  John D. Storey,et al.  A nonparametric estimator of population structure unifying admixture models and principal components analysis , 2017, bioRxiv.

[44]  Michael Westergaard,et al.  Using Colored Petri Nets to Construct Coalescent Hidden Markov Models: Automatic Translation from Demographic Specifications to Efficient Inference Methods , 2012, Petri Nets.

[45]  Joshua S. Paul,et al.  A Principled Approach to Deriving Approximate Conditional Sampling Distributions in Population Genetics Models with Recombination , 2010, Genetics.

[46]  Yun S. Song,et al.  Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum , 2015, Proceedings of the National Academy of Sciences.

[47]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[48]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[49]  Sara Sheehan,et al.  Decoding Coalescent Hidden Markov Models in Linear Time , 2014, RECOMB.

[50]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[51]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[52]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[53]  Aaron P. Ragsdale,et al.  Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation , 2017, Genetics.

[54]  Yun S. Song,et al.  Robust and scalable inference of population history from hundreds of unphased whole genomes , 2016, Nature Genetics.

[55]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[56]  Joshua S. Paul,et al.  An Accurate Sequentially Markov Conditional Sampling Distribution for the Coalescent With Recombination , 2011, Genetics.

[57]  A. Eyre-Walker,et al.  Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans , 2018, PLoS genetics.

[58]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[59]  Aaron P. Ragsdale,et al.  Inferring Demographic History Using Two-Locus Statistics , 2017, Genetics.

[60]  Yun S. Song,et al.  Efficiently inferring the demographic history of many populations with allele count data , 2018, bioRxiv.

[61]  Brian L Browning,et al.  Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent. , 2015, American journal of human genetics.

[62]  John A. Kamm,et al.  Two-Locus Likelihoods Under Variable Population Size and Fine-Scale Recombination Rate Estimation , 2015, Genetics.

[63]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[64]  Joshua S. Paul,et al.  A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. , 2012, Theoretical population biology.

[65]  M. Stephens,et al.  fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets , 2014, Genetics.

[66]  A. Gylfason,et al.  Fine-scale recombination rate differences between sexes, populations and individuals , 2010, Nature.

[67]  Jeremiah D. Degenhardt,et al.  Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication , 2010, Nature.

[68]  S. Gravel,et al.  On the decidability of population size histories from finite allele frequency spectra , 2017, bioRxiv.

[69]  Julia A. Palacios,et al.  Exact limits of inference in coalescent models , 2017, Theoretical population biology.

[70]  Jesse Dabney,et al.  Ancient DNA damage. , 2013, Cold Spring Harbor perspectives in biology.

[71]  Jerome Kelleher,et al.  Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes , 2015, bioRxiv.

[72]  Yun S. Song,et al.  Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference , 2017, Genetics.

[73]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[74]  John A. Kamm,et al.  Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans , 2018, Nature.

[75]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.

[76]  Itsik Pe'er,et al.  Inference of historical migration rates via haplotype sharing , 2013, Bioinform..

[77]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[78]  Yun S. Song,et al.  Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans , 2017, bioRxiv.

[79]  C. J-F,et al.  THE COALESCENT , 1980 .

[80]  Matthias Steinrücken,et al.  Computing the joint distribution of the total tree length across loci in populations with variable size. , 2016, Theoretical population biology.