Recovering population parameters from a single gene genealogy: an unbiased estimator of the growth rate.

We show that the number of lineages ancestral to a sample, as a function of time back into the past, which we call the number of lineages as a function of time (NLFT), is a nearly deterministic property of large-sample gene genealogies. We obtain analytic expressions for the NLFT for both constant-sized and exponentially growing populations. The low level of stochastic variation associated with the NLFT of a large sample suggests using the NLFT to make estimates of population parameters. Based on this, we develop a new computational method of inferring the size and growth rate of a population from a large sample of DNA sequences at a single locus. We apply our method first to a sample of 1,212 mitochondrial DNA (mtDNA) sequences from China, confirming a pattern of recent population growth previously identified using other techniques, but with much smaller confidence intervals for past population sizes due to the low variation of the NLFT. We further analyze a set of 63 mtDNA sequences from blue whales (BWs), concluding that the population grew in the past. This calls for reevaluation of previous studies that were based on the assumption that the BW population was fixed.

[1]  Tanja Stadler,et al.  Lineages-through-time plots of neutral models for speciation. , 2008, Mathematical biosciences.

[2]  M. Sanderson,et al.  Age and rate of diversification of the Hawaiian silversword alliance (Compositae). , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  O. Pybus,et al.  An integrated framework for the inference of viral population history from reconstructed genealogies. , 2000, Genetics.

[4]  Alexei J Drummond,et al.  mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. , 2008, Molecular biology and evolution.

[5]  J. Felsenstein 1 TREES OF GENES IN POPULATIONS , 2022 .

[6]  A. Di Rienzo,et al.  Branching pattern in the evolutionary tree for human mitochondrial DNA. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Arndt von Haeseler,et al.  Compilation of human mtDNA control region sequences , 1998, Nucleic Acids Res..

[8]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[9]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[10]  Lucian P. Smith,et al.  Comparing Likelihood and Bayesian Coalescent Estimation of Population Parameters , 2007, Genetics.

[11]  C. Moreau,et al.  Phylogeny of the Ants: Diversification in the Age of Angiosperms , 2006, Science.

[12]  J. Felsenstein Accuracy of coalescent likelihood estimates: do we need more sites, more sequences, or more loci? , 2006, Molecular biology and evolution.

[13]  Stephen M. Krone,et al.  On the Meaning and Existence of an Effective Population Size , 2005, Genetics.

[14]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[15]  P. Donnelly,et al.  The mutation rate in the human mtDNA control region. , 2000, American journal of human genetics.

[16]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[17]  S. Palumbi,et al.  Big and slow: phylogenetic estimates of molecular evolution in baleen whales (suborder mysticeti). , 2009, Molecular biology and evolution.

[18]  Peter Donnelly,et al.  Ancestral Inference in Population Genetics Models with Selection (with Discussion) , 2003 .

[19]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[20]  A. von Haeseler,et al.  Inference of population history using a likelihood approach. , 1998, Genetics.

[21]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[22]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[23]  D. Turnbull,et al.  The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates. , 2003, American journal of human genetics.

[24]  Norman L. Johnson,et al.  Urn models and their application , 1977 .

[25]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[26]  R. Sokal,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification. , 1975 .

[27]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[28]  M. Przeworski,et al.  A new approach to estimate parameters of speciation models with application to apes. , 2007, Genome research.

[29]  E. Holmes,et al.  Inferring population history from molecular phylogenies. , 1995, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[30]  Olivier Gascuel,et al.  Reconstructing evolution : new mathematical and computational advances , 2007 .

[31]  S. Rosset,et al.  Maximum-Likelihood Estimation of Site-Specific Mutation Rates in Human Mitochondrial DNA From Partial Phylogenetic Classification , 2008, Genetics.

[32]  Y. Bar-Yam,et al.  Estimating the total genetic diversity of a spatial field population from a sample and implications of its dependence on habitat area. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[34]  K. Strimmer,et al.  Exploring the demographic history of DNA sequences using the generalized skyline plot. , 2001, Molecular biology and evolution.

[35]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .

[36]  M. Kimmel,et al.  Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[37]  C. Simulating Probability Distributions in the Coalescent * , 2022 .

[38]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[39]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[40]  Richard R. Hudson,et al.  TESTING THE CONSTANT‐RATE NEUTRAL ALLELE MODEL WITH PROTEIN SEQUENCE DATA , 1983, Evolution; international journal of organic evolution.

[41]  W. Li,et al.  Estimating the age of the common ancestor of a sample of DNA sequences. , 1997, Molecular biology and evolution.

[42]  John Wakeley,et al.  Gene genealogies when the sample size exceeds the effective size of the population. , 2003, Molecular biology and evolution.

[43]  Stephen M. Krone,et al.  Separation of time scales and convergence to the coalescent in structured populations ∗ , 2001 .

[44]  Jason E Stajich,et al.  Likelihoods From Summary Statistics: Recent Divergence Between Species , 2005, Genetics.

[45]  J. Crow,et al.  INBREEDING AND VARIANCE EFFECTIVE POPULATION NUMBERS , 1988, Evolution; international journal of organic evolution.

[46]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[47]  Jon A Yamato,et al.  Maximum likelihood estimation of population growth rates based on the coalescent. , 1998, Genetics.

[48]  P. Donnelly,et al.  Optimal sequencing strategies for surveying molecular genetic diversity. , 1996, Genetics.

[49]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[50]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[51]  Kate E. Jones,et al.  The delayed rise of present-day mammals , 1990, Nature.

[52]  R. A. Fisher,et al.  The Genetical Theory of Natural Selection , 1931 .

[53]  J. Kingman On the genealogy of large populations , 1982 .

[54]  Arndt von Haeseler,et al.  HvrBase++: a phylogenetic database for primate species , 2005, Nucleic Acids Res..

[55]  D. Balding,et al.  Handbook of statistical genetics , 2004 .

[56]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[57]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[58]  C. J-F,et al.  THE COALESCENT , 1980 .

[59]  Montgomery Slatkin,et al.  Modern developments in theoretical population genetics : the legacy of Gustave Malécot , 2002 .

[60]  Model,et al.  Robustness Results for the Coales- Cent , 1998 .

[61]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[62]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[63]  S. Tavaré,et al.  Ancestral Inference in Population Genetics , 1994 .