Likelihoods From Summary Statistics: Recent Divergence Between Species

We describe an importance-sampling method for approximating likelihoods of population parameters based on multiple summary statistics. In this first application, we address the demographic history of closely related members of the Drosophila pseudoobscura group. We base the maximum-likelihood estimation of the time since speciation and the effective population sizes of the extant and ancestral populations on the pattern of nucleotide variation at DPS2002, a noncoding region tightly linked to a paracentric inversion that strongly contributes to reproductive isolation. Consideration of summary statistics rather than entire nucleotide sequences permits a compact description of the genealogy of the sample. We use importance sampling first to propose a genealogical and mutational history consistent with the observed array of summary statistics and then to correct the likelihood with the exact probability of the history determined from a system of recursions. Analysis of a subset of the data, for which recursive computation of the exact likelihood was feasible, indicated close agreement between the approximate and exact likelihoods. Our results for the complete data set also compare well with those obtained through Metropolis-Hastings sampling of fully resolved genealogies of entire nucleotide sequences.

[1]  M. Slatkin,et al.  Estimating the age of alleles by use of intraallelic variability. , 1997, American journal of human genetics.

[2]  M. Stephens 8 Inference Under the Coalescent , 2000 .

[3]  Sudhir Kumar,et al.  Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. , 2003, Molecular biology and evolution.

[4]  J. Wall,et al.  Testing models of selection and demography in Drosophila simulans. , 2002, Genetics.

[5]  Ziheng Yang,et al.  Maximum-likelihood models for combined analyses of multiple sequence data , 1996, Journal of Molecular Evolution.

[6]  Robert C. Griffiths,et al.  Monte Carlo inference methods in population genetics , 1996 .

[7]  A. Weaver,et al.  Molecular evolution of inversions in Drosophila pseudoobscura: the amylase gene region. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[9]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[10]  C. A. Machado,et al.  Inferring the history of speciation from multilocus DNA sequence data: the case of Drosophila pseudoobscura and close relatives. , 2002, Molecular biology and evolution.

[11]  D. Hartl,et al.  A maximum likelihood method for analyzing pseudogene evolution: implications for silent site evolution in humans and rodents. , 2002, Molecular biology and evolution.

[12]  S Karlin,et al.  Genome-scale compositional comparisons in eukaryotes. , 2001, Genome research.

[13]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[14]  Gordon Luikart,et al.  Comparative Evaluation of a New Effective Population Size Estimator Based on Approximate Bayesian Computation , 2004, Genetics.

[15]  S. Liu-Cordero,et al.  The discovery of single-nucleotide polymorphisms--and inferences about human demographic history. , 2001, American journal of human genetics.

[16]  Jean-Marie Cornuet,et al.  GENETIC ANALYSIS OF COMPLEX DEMOGRAPHIC SCENARIOS: SPATIALLY EXPANDING POPULATIONS OF THE CANE TOAD, BUFO MARINUS , 2004, Evolution; international journal of organic evolution.

[17]  John Wakeley,et al.  Estimating Divergence Times from Molecular Data on Phylogenetic and Population Genetic Timescales , 2002 .

[18]  J. Wall,et al.  Coalescent simulations and statistical tests of neutrality. , 2001, Molecular biology and evolution.

[19]  F. Depaulis,et al.  Neutrality tests based on the distribution of haplotypes under an infinite-site model. , 1998, Molecular biology and evolution.

[20]  E. Betrán,et al.  Recombination and gene flux caused by gene conversion and crossing over in inversion heterokaryotypes. , 1997, Genetics.

[21]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[22]  M. Noor,et al.  Chromosomal inversions and the reproductive isolation of species , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  M. Beaumont,et al.  Recent developments in genetic data analysis: what can they tell us about human demographic history? , 2004, Heredity.

[24]  P. Donnelly,et al.  Inference in molecular population genetics , 2000 .

[25]  R. Wolpert,et al.  Integrated likelihood methods for eliminating nuisance parameters , 1999 .

[26]  G. A. Watterson The genetic divergence of two populations , 1985 .

[27]  P Donnelly,et al.  Coalescents and genealogical structure under neutrality. , 1995, Annual review of genetics.

[28]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[29]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[30]  R. Nielsen,et al.  Maximum likelihood estimation of population divergence times and population phylogenies under the infinite sites model. , 1998, Theoretical population biology.

[31]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[32]  J. Wakeley,et al.  Estimating ancestral population parameters. , 1997, Genetics.

[33]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[34]  David J. Balding,et al.  Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities , 2003 .

[35]  Y. Fu,et al.  Statistical properties of segregating sites. , 1995, Theoretical population biology.

[36]  Loren H. Rieseberg,et al.  Gene trees and species trees are not the same , 2001 .

[37]  J. Felsenstein,et al.  Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates. , 1992, Genetical research.

[38]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[39]  M. Nei,et al.  Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. , 2000, Molecular biology and evolution.

[40]  M. Noor,et al.  Recombination, statistical power, and genetic studies of sexual isolation in Drosophila. , 2000, The Journal of heredity.

[41]  M. Uyenoyama,et al.  A simple method for computing exact probabilities of mutation numbers. , 2004, Theoretical population biology.

[42]  F. Depaulis,et al.  Haplotype tests using coalescent simulations conditional on the number of segregating sites. , 2001, Molecular biology and evolution.

[43]  M. Noor,et al.  THE GENETICS OF REPRODUCTIVE ISOLATION AND THE POTENTIAL FOR GENE EXCHANGE BETWEEN DROSOPHILA PSEUDOOBSCURA AND D. PERSIMILIS VIA BACKCROSS HYBRID MALES , 2001, Evolution; international journal of organic evolution.

[44]  J. Wall Estimating ancestral population sizes and divergence times. , 2003, Genetics.

[45]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[46]  C. Wiuf Inferring population history from genealogical trees , 2003, Journal of mathematical biology.

[47]  J. Klein,et al.  Divergence time and population size in the lineage leading to modern humans. , 1995, Theoretical population biology.

[48]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[49]  H. Akashi,et al.  Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. , 1995, Genetics.

[50]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[51]  Jody Hey,et al.  The study of structured populations — new hope for a difficult and divided science , 2003, Nature Reviews Genetics.

[52]  S. Tavaré,et al.  On a test of Depaulis and Veuille. , 2001, Molecular biology and evolution.

[53]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[54]  A. Bird,et al.  The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Nicolas Ray,et al.  Bayesian Estimation of Recent Migration Rates After a Spatial Expansion , 2005, Genetics.

[56]  T. Dobzhansky,et al.  Genetics of natural populations : the continuing importance of Theodosius Dobzhansky , 1995 .

[57]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[58]  J. Crow,et al.  INBREEDING AND VARIANCE EFFECTIVE POPULATION NUMBERS , 1988, Evolution; international journal of organic evolution.

[59]  Peter Beerli,et al.  Likelihoods on coalescents: a Monte Carlo sampling approach to inferring parameters from population samples of molecular data , 1999 .

[60]  Helen Piontkivska,et al.  Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used. , 2004, Molecular phylogenetics and evolution.

[61]  W. Li,et al.  Estimating the age of the common ancestor of a sample of DNA sequences. , 1997, Molecular biology and evolution.

[62]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[63]  P. Donnelly,et al.  Conditional genealogies and the age of a neutral mutant. , 1999, Theoretical population biology.

[64]  T. Dobzhansky,et al.  Studies on hybrid sterility III , 1937, Zeitschrift für Induktive Abstammungs- und Vererbungslehre.

[65]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[67]  M. Uyenoyama,et al.  Maximum-Likelihood Estimation of Rates of Recombination Within Mating-Type Regions , 2004, Genetics.

[68]  Montgomery Slatkin,et al.  Modern developments in theoretical population genetics : the legacy of Gustave Malécot , 2002 .

[69]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[70]  Simon D W Frost,et al.  A simple hierarchical approach to modeling distributions of substitution rates. , 2005, Molecular biology and evolution.

[71]  M. Slatkin,et al.  Estimating allele age. , 2003, Annual review of genomics and human genetics.

[72]  N. Barton,et al.  Theory and speciation. , 2001, Trends in ecology & evolution.

[73]  M. Stephens,et al.  Inference Under the Coalescent , 2004 .

[74]  M. Slatkin Inbreeding coefficients and coalescence times. , 2007, Genetical research.

[75]  J. Wakeley,et al.  Gene flow and natural selection in the origin of Drosophila pseudoobscura and close relatives. , 1997, Genetics.

[76]  Jon A Yamato,et al.  Maximum likelihood estimation of population growth rates based on the coalescent. , 1998, Genetics.

[77]  Sudhir Kumar,et al.  Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. , 2003, Genome research.

[78]  A. von Haeseler,et al.  Inference of population history using a likelihood approach. , 1998, Genetics.

[79]  T. Dobzhansky,et al.  Drosophila pseudoobscura and Its American Relatives, Drosophila persimilis and Drosophila miranda , 1975 .

[80]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[81]  C. Simulating Probability Distributions in the Coalescent * , 2022 .