Maximum Likelihood Implementation of an Isolation‐with‐Migration Model for Three Species

Abstract We develop a maximum likelihood (ML) method for estimating migration rates between species using genomic sequence data. A species tree is used to accommodate the phylogenetic relationships among three species, allowing for migration between the two sister species, while the third species is used as an out‐group. A Markov chain characterization of the genealogical process of coalescence and migration is used to integrate out the migration histories at each locus analytically, whereas Gaussian quadrature is used to integrate over the coalescent times on each genealogical tree numerically. This is an extension of our early implementation of the symmetrical isolation‐with‐migration model for three species to accommodate arbitrary loci with two or three sequences per locus and to allow asymmetrical migration rates. Our implementation can accommodate tens of thousands of loci, making it feasible to analyze genome‐scale data sets to test for gene flow. We calculate the posterior probabilities of gene trees at individual loci to identify genomic regions that are likely to have been transferred between species due to gene flow. We conduct a simulation study to examine the statistical properties of the likelihood ratio test for gene flow between the two in‐group species and of the ML estimates of model parameters such as the migration rate. Inclusion of data from a third out‐group species is found to increase dramatically the power of the test and the precision of parameter estimation. We compiled and analyzed several genomic data sets from the Drosophila fruit flies. Our analyses suggest no migration from D. melanogaster to D. simulans, and a significant amount of gene flow from D. simulans to D. melanogaster, at the rate of ∼0.02 migrant individuals per generation. We discuss the utility of the multispecies coalescent model for species tree estimation, accounting for incomplete lineage sorting and migration.

[1]  Ziheng Yang A Likelihood Ratio Test of Speciation with Gene Flow Using Genomic Sequence Data , 2010, Genome biology and evolution.

[2]  Axel Janke,et al.  Bears in a Forest of Gene Trees: Phylogenetic Inference Is Complicated by Incomplete Lineage Sorting and Gene Flow , 2014, Molecular biology and evolution.

[3]  J. Hey,et al.  Estimating Divergence Parameters With Small Samples From a Large Number of Loci , 2010, Genetics.

[4]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[5]  M. E. Galassi,et al.  GNU SCIENTI C LIBRARY REFERENCE MANUAL , 2005 .

[6]  R. Griffiths,et al.  The coalescent in two colonies with symmetric migration , 1993, Journal of mathematical biology.

[7]  J. Felsenstein,et al.  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. , 1999, Genetics.

[8]  M. Nei,et al.  Molecular phylogeny and divergence times of drosophilid species. , 1995, Molecular biology and evolution.

[9]  H. Innan,et al.  An autosomal analysis gives no genetic evidence for complex speciation of humans and chimpanzees. , 2012, Molecular biology and evolution.

[10]  Eric S. Lander,et al.  Genetic evidence for complex speciation of humans and chimpanzees , 2006, Nature.

[11]  H. Wilkinson-Herbots The distribution of the coalescence time and the number of pairwise nucleotide differences in the "isolation with migration" model. , 2008, Theoretical population biology.

[12]  Inferring the evolutionary histories of divergences in Hylobates and Nomascus gibbons through multilocus sequence data , 2013, BMC Evolutionary Biology.

[13]  C. J-F,et al.  THE COALESCENT , 1980 .

[14]  N. Takahata,et al.  The coalescent in two partially isolated diffusion populations. , 1988, Genetical research.

[15]  A Gajdos,et al.  [Evolution of protein molecules. I. Protein synthesis]. , 1972, La Nouvelle presse medicale.

[16]  J. Mallet Hybridization as an invasion of the genome. , 2005, Trends in ecology & evolution.

[17]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[18]  R. Griffiths,et al.  Inference from gene trees in a subdivided population. , 2000, Theoretical population biology.

[19]  H. Wilkinson-Herbots,et al.  Genealogy and subpopulation differentiation under various models of population structure , 1998 .

[20]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[21]  Giulia Antonazzo,et al.  FlyBase: establishing a Gene Group resource for Drosophila melanogaster , 2015, Nucleic Acids Res..

[22]  A. Siepel,et al.  Bayesian inference of ancient human demography from individual genome sequences , 2011, Nature Genetics.

[23]  S. Wright,et al.  Isolation by Distance. , 1943, Genetics.

[24]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[25]  Ziheng Yang,et al.  Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. , 2002, Genetics.

[26]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[27]  B. Rannala,et al.  Bayesian species delimitation using multilocus sequence data , 2010, Proceedings of the National Academy of Sciences.

[28]  P. Boursot,et al.  Recurrent introgression of mitochondrial DNA among hares (Lepus spp.) revealed by species-tree inference and coalescent simulations. , 2012, Systematic biology.

[29]  Ya-ping Zhang,et al.  Speciation in the Rana chensinensis species complex and its relationship to the uplift of the Qinghai–Tibetan Plateau , 2012, Molecular ecology.

[30]  Matthias Platzer,et al.  Mapping human genetic ancestry. , 2007, Molecular biology and evolution.

[31]  Tianqi Zhu,et al.  Maximum likelihood implementation of an isolation-with-migration model with three species for testing speciation with gene flow. , 2012, Molecular biology and evolution.

[32]  R. Ennos,et al.  Next-generation hybridization and introgression , 2011, Heredity.

[33]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[34]  Ziheng Yang,et al.  Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. , 2008, Molecular biology and evolution.

[35]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[36]  Luay Nakhleh,et al.  Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis , 2016, Molecular ecology.

[37]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[38]  Xiaofang Jiang,et al.  Extensive introgression in a malaria vector species complex revealed by phylogenomics , 2015, Science.

[39]  A. Hobolth,et al.  Efficient computation in the IM model , 2014, Journal of Mathematical Biology.

[40]  M. Kimura,et al.  The Stepping Stone Model of Population Structure and the Decrease of Genetic Correlation with Distance. , 1964, Genetics.

[41]  C. Schlötterer,et al.  Genome assembly and annotation of a Drosophila simulans strain from Madagascar , 2014, Molecular ecology resources.

[42]  Kevin R. Thornton,et al.  A second-generation assembly of the Drosophila simulans genome provides new insights into patterns of lineage-specific divergence , 2013, Genome research.

[43]  Hidemi Watanabe,et al.  The effect of gene flow on the coalescent time in the human-chimpanzee ancestral population. , 2006, Molecular biology and evolution.

[44]  Thomas Mailund,et al.  On Computing the Coalescence Time Density in an Isolation-With-Migration Model With Few Samples , 2011, Genetics.

[45]  R. J. Harrison,et al.  A General Method for Calculating Likelihoods Under the Coalescent Process , 2011, Genetics.

[46]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[47]  J. Klein,et al.  Divergence time and population size in the lineage leading to modern humans. , 1995, Theoretical population biology.

[48]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[49]  C. Strobeck,et al.  Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. , 1987, Genetics.

[50]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[51]  W. Stephan,et al.  Distinctly Different Sex Ratios in African and European Populations of Drosophila melanogaster Inferred From Chromosomewide Single Nucleotide Polymorphism Data , 2007, Genetics.

[52]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[53]  Simon H. Martin,et al.  Genome-wide evidence for speciation with gene flow in Heliconius butterflies , 2013, Genome research.

[54]  Ziheng Yang The BPP program for species tree estimation and species delimitation , 2015 .

[55]  Naruya Saitou,et al.  Property and efficiency of the maximum likelihood method for molecular phylogeny , 2005, Journal of Molecular Evolution.

[56]  Ziheng Yang,et al.  Population genetics of speciation in nonmodel organisms: I. Ancestral polymorphism in mangroves. , 2007, Molecular biology and evolution.

[57]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[59]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[60]  Ziheng Yang Statistical Properties of the Maximum Likelihood Method of Phylogenetic Estimation and Comparison With Distance Matrix Methods , 1994 .

[61]  Ziheng Yang,et al.  Molecular Evolution: A Statistical Approach , 2014 .

[62]  A. Rambaut,et al.  Estimating Divergence Dates and Substitution Rates in the Drosophila Phylogeny , 2012, Molecular biology and evolution.

[63]  Jun Wang,et al.  Population Genomics Reveal Recent Speciation and Rapid Evolutionary Adaptation in Polar Bears , 2014, Cell.

[64]  James Mallet,et al.  How reticulated are species? , 2015, BioEssays : news and reviews in molecular, cellular and developmental biology.

[65]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[66]  Charles W. Linkem,et al.  Comparative Species Divergence across Eight Triplets of Spiny Lizards (Sceloporus) Using Genomic Sequence Data , 2013, Genome biology and evolution.

[67]  W. Li,et al.  Distribution of nucleotide differences between two randomly chosen cistrons in a subdivided population: the finite island model. , 1976, Theoretical population biology.

[68]  Pall I. Olason,et al.  The genomic landscape of species divergence in Ficedula flycatchers , 2012, Nature.

[69]  Tianqi Zhu,et al.  Evaluation of a bayesian coalescent method of species delimitation. , 2011, Systematic biology.

[70]  Colin N. Dewey,et al.  Genomic Variation in Natural Populations of Drosophila melanogaster , 2012, Genetics.

[71]  H. Wilkinson-Herbots The distribution of the coalescence time and the number of pairwise nucleotide differences in a model of population divergence or speciation with an initial period of gene flow. , 2012, Theoretical population biology.