Intercoalescence Time Distribution of Incomplete Gene Genealogies in Temporally Varying Populations, and Applications in Population Genetic Inference

Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time‐varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent‐based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large‐scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations.

[1]  J. Wakeley,et al.  Estimating ancestral population parameters. , 1997, Genetics.

[2]  P. Donnelly,et al.  Pairwise comparisons of mitochondrial DNA sequences in subdivided populations and implications for early human evolution. , 1994, Genetics.

[3]  R. Hudson,et al.  Maximum-Likelihood Estimation of Demographic Parameters Using the Frequency Spectrum of Unlinked Single-Nucleotide Polymorphisms , 2004, Genetics.

[4]  P Donnelly,et al.  Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. , 1998, Genetics.

[5]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[6]  Hui Li,et al.  A mitochondrial revelation of early human migrations to the Tibetan Plateau before and after the last glacial maximum. , 2010, American journal of physical anthropology.

[7]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[8]  Y. Fu,et al.  Statistical properties of segregating sites. , 1995, Theoretical population biology.

[9]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[10]  J. Hey On the Number of New World Founders: A Population Genetic Portrait of the Peopling of the Americas , 2005, PLoS biology.

[11]  S. Tavaré,et al.  The age of a mutation in a general coalescent tree , 1998 .

[12]  M. Slatkin,et al.  ESTIMATION OF THE NUMBER OF INDIVIDUALS FOUNDING COLONIZED POPULATIONS , 2007, Evolution; international journal of organic evolution.

[13]  W. Stephan,et al.  Analytical results on the neutral non-equilibrium allele frequency spectrum based on diffusion theory. , 2011, Theoretical population biology.

[14]  D. Hartl,et al.  Directional selection and the site-frequency spectrum. , 2001, Genetics.

[15]  J. Wall,et al.  When did the human population size start increasing? , 2000, Genetics.

[16]  W. Ewens Mathematical Population Genetics : I. Theoretical Introduction , 2004 .

[17]  M. Nordborg,et al.  Coalescent Theory , 2019, Handbook of Statistical Genomics.

[18]  S. Wooding,et al.  The matrix coalescent and an application to human single-nucleotide polymorphisms. , 2002, Genetics.

[19]  Sivakumar Gowrisankar,et al.  Pattern of sequence variation across 213 environmental response genes. , 2004, Genome research.

[20]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[21]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[22]  Jin Ok Yang,et al.  Mapping Human Genetic Diversity in Asia , 2009, Science.

[23]  C. Simulating Probability Distributions in the Coalescent * , 2022 .

[24]  Yi Peng,et al.  Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations , 2008, BMC Biology.

[25]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[26]  Saharon Rosset,et al.  A "Copernican" reassessment of the human mitochondrial DNA tree from its root. , 2012, American journal of human genetics.

[27]  Yun S. Song,et al.  A Simple Method for Finding Explicit Analytic Transition Densities of Diffusion Processes with General Diploid Selection , 2012, Genetics.

[28]  Chao Qian,et al.  Population , 1940, State Rankings 2020: A Statistical View of America.

[29]  M. Slatkin,et al.  Estimating the number of founder lineages from haplotypes of closely linked SNPs , 2007, Molecular ecology.

[30]  M. Kimmel,et al.  A note on distributions of times to coalescence, under time-dependent population size. , 2003, Theoretical population biology.

[31]  S. Tavaré,et al.  Line-of-descent and genealogical processes, and their applications in population genetics models. , 1984, Theoretical population biology.

[32]  N. Risch,et al.  Geographic distribution of disease mutations in the Ashkenazi Jewish population supports genetic drift over selection. , 2003, American journal of human genetics.

[33]  Hua Chen The joint allele frequency spectrum of multiple populations: a coalescent theory approach. , 2012, Theoretical population biology.

[34]  Hui Zhang,et al.  Genetic variations in Tibetan populations and high-altitude adaptation at the Himalayas. , 2011, Molecular biology and evolution.

[35]  Gabor T. Marth,et al.  The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations , 2004, Genetics.

[36]  M. Slatkin,et al.  The Joint Allele-Frequency Spectrum in Closely Related Species , 2007, Genetics.

[37]  Marty C. Brandon,et al.  Natural selection shaped regional mtDNA variation in humans , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Katrina M. Dlugosch,et al.  Founding events in species invasions: genetic variation, adaptive evolution, and the role of multiple introductions , 2008, Molecular ecology.

[39]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[40]  Kirk E Lohmueller,et al.  Detecting ancient admixture and estimating demographic parameters in multiple human populations. , 2009, Molecular biology and evolution.

[41]  D. Hartl,et al.  Population genetics of polymorphism and divergence. , 1992, Genetics.

[42]  S. Tavaré,et al.  Sampling theory for neutral alleles in a varying environment. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[43]  John Wakeley,et al.  Recovering population parameters from a single gene genealogy: an unbiased estimator of the growth rate. , 2011, Molecular biology and evolution.

[44]  Alkes L. Price,et al.  Reconstructing Indian Population History , 2009, Nature.

[45]  Norman L. Johnson,et al.  Urn models and their application , 1977 .

[46]  C. J-F,et al.  THE COALESCENT , 1980 .

[47]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[48]  N. Rosenberg,et al.  Estimating the Number of Ancestral Lineages Using a Maximum-Likelihood Method Based on Rejection Sampling , 2007, Genetics.

[49]  M. Slatkin Gene genealogies within mutant allelic classes. , 1996, Genetics.

[50]  A. Siepel,et al.  Bayesian inference of ancient human demography from individual genome sequences , 2011, Nature Genetics.

[51]  Yiping Shen,et al.  A genome-wide search for signals of high-altitude adaptation in Tibetans. , 2011, Molecular biology and evolution.

[52]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[53]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[54]  Kevin C. Chen,et al.  Non-equilibrium allele frequency spectra via spectral methods. , 2010, Theoretical population biology.

[55]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[56]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[57]  Ryan D. Hernandez,et al.  Simultaneous inference of selection and population growth from patterns of variation in the human genome , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Q. Kong,et al.  Mitochondrial genome evidence reveals successful Late Paleolithic settlement on the Tibetan Plateau , 2009, Proceedings of the National Academy of Sciences.

[59]  M. Kimmel,et al.  New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. , 2003, Genetics.

[60]  Steven N Evans,et al.  Non-equilibrium theory of the allele frequency spectrum. , 2006, Theoretical population biology.

[61]  Scott M. Williams,et al.  The Genetic Structure and History of Africans and African Americans , 2009, Science.

[62]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[63]  David B. Witonsky,et al.  Reconstructing Native American Population History , 2012, Nature.