Does gene flow destroy phylogenetic signal? The performance of three methods for estimating species phylogenies in the presence of gene flow.

Incomplete lineage sorting has been documented across a diverse set of taxa ranging from song birds to conifers. Such patterns are expected theoretically for species characterized by certain life history characteristics (e.g. long generation times) and those influenced by certain historical demographic events (e.g. recent divergences). A number of methods to estimate the underlying species phylogeny from a set of gene trees have been proposed and shown to be effective when incomplete lineage sorting has occurred. The further effects of gene flow on those methods, however, remain to be investigated. Here, we focus on the performance of three methods of species tree inference, ESP-COAL, minimizing deep coalescence (MDC), and concatenation, when incomplete lineage sorting and gene flow jointly confound the relationship between gene and species trees. Performance was investigated using Monte Carlo coalescent simulations under four models (n-island, stepping stone, parapatric, and allopatric) and three magnitudes of gene flow (N(e)m=0.01, 0.10, 1.00). Although results varied by the model and magnitude of gene flow, methods incorporating aspects of the coalescent process (ESP-COAL and MDC) performed well, with probabilities of identifying the correct species tree topology typically increasing to greater than 0.75 when five more loci are sampled. The only exceptions to that pattern included gene flow at moderate to high magnitudes under the n-island and stepping stone models. Concatenation performs poorly relative to the other methods. We extend these results to a discussion of the importance of species and population phylogenies to the fields of molecular systematics and phylogeography using an empirical example from Rhododendron.

[1]  Mary K. Kuhner,et al.  LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters , 2006, Bioinform..

[2]  M. Sanderson Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. , 2002, Molecular biology and evolution.

[3]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[4]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[5]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[6]  R. Hudson Gene trees, species trees and the segregation of ancestral alleles. , 1992, Genetics.

[7]  A. Liston,et al.  Widespread genealogical nonmonophyly in species of Pinus subgenus Strobus. , 2007, Systematic biology.

[8]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[9]  M. Carling,et al.  Integrating Phylogenetic and Population Genetic Analyses of Multiple Loci to Test Species Divergence Hypotheses in Passerina Buntings , 2008, Genetics.

[10]  M. Nei,et al.  Relationships between Gene Trees and Species Trees1 , 1998 .

[11]  Zaid Abdo,et al.  Performance-based selection of likelihood models for phylogeny estimation. , 2003, Systematic biology.

[12]  N. Takahata Gene genealogy in three related populations: consistency probability between gene and population trees. , 1989, Genetics.

[13]  R. Milne Phylogeny and biogeography of Rhododendron subsection Pontica, a group with a tertiary relict distribution. , 2004, Molecular phylogenetics and evolution.

[14]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[15]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[16]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[17]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[18]  L. Excoffier,et al.  SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography. , 2000, The Journal of heredity.

[19]  Richard R. Hudson,et al.  TESTING THE CONSTANT‐RATE NEUTRAL ALLELE MODEL WITH PROTEIN SEQUENCE DATA , 1983, Evolution; international journal of organic evolution.

[20]  J. Felsenstein,et al.  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. , 1999, Genetics.

[21]  J. Wall Estimating ancestral population sizes and divergence times. , 2003, Genetics.

[22]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[23]  J. Felsenstein Confidence Limits on Phylogenies With a Molecular Clock , 1985 .

[24]  Michael J. Hickerson,et al.  msBayes: Pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation , 2007, BMC Bioinformatics.

[25]  M Slatkin,et al.  A cladistic measure of gene flow inferred from the phylogenies of alleles. , 1989, Genetics.

[26]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Richard H. Ree,et al.  Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. , 2008, Systematic biology.

[28]  R. Nielsen Molecular signatures of natural selection. , 2005, Annual review of genetics.

[29]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[30]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[31]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[32]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[33]  oseph,et al.  How Should Species Phylogenies Be Inferred from Sequence Data? , 2001 .

[34]  O. Gascuel,et al.  Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. , 2006, Systematic biology.

[35]  S. Edwards,et al.  Comparison of species tree methods for reconstructing the phylogeny of bearded manakins (Aves: Pipridae, Manacus) from multilocus sequence data. , 2008, Systematic biology.

[36]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[37]  P. Stevens,et al.  Phylogenetic classification of Ericaceae: Molecular and morphological evidence , 2002, The Botanical Review.

[38]  M. P. Cummings Transmission patterns of eukaryotic transposable elements: arguments for and against horizontal transfer. , 1994, Trends in ecology & evolution.

[39]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[40]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[41]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[42]  W. P. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.01 (Build j28) , 2007 .

[43]  Noah A Rosenberg,et al.  THE SHAPES OF NEUTRAL GENE GENEALOGIES IN TWO SPECIES: PROBABILITIES OF MONOPHYLY, PARAPHYLY, AND POLYPHYLY IN A COALESCENT MODEL , 2003, Evolution; international journal of organic evolution.

[44]  R. Schmid,et al.  Encyclopedia of Rhododendron Species , 1997 .

[45]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[46]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[47]  B. Hall,et al.  The Molecular Systematics of Rhododendron (Ericaceae): A Phylogeny Based Upon RPB2 Gene Sequences , 2005 .

[48]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[49]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[50]  Jon A Yamato,et al.  Maximum likelihood estimation of population growth rates based on the coalescent. , 1998, Genetics.

[51]  Bryan C. Carstens,et al.  ESTIMATING A GEOGRAPHICALLY EXPLICIT MODEL OF POPULATION DIVERGENCE , 2007, Evolution; international journal of organic evolution.

[52]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[53]  J. Hey Recent advances in assessing gene flow between diverging populations and species. , 2006, Current opinion in genetics & development.

[54]  Ziheng Yang,et al.  Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. , 2002, Genetics.

[55]  C. J-F,et al.  THE COALESCENT , 1980 .

[56]  L Lacey Knowles,et al.  Statistical phylogeography. , 2002, Molecular ecology.

[57]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[58]  W. Maddison Gene Trees in Species Trees , 1997 .