What is the danger of the anomaly zone for empirical phylogenetics?

The increasing number of observations of gene trees with discordant topologies in phylogenetic studies has raised awareness about the problems of incongruence between species trees and gene trees. Moreover, theoretical treatments focusing on the impact of coalescent variance on phylogenetic study have also identified situations where the most probable gene trees are ones that do not match the underlying species tree (i.e., anomalous gene trees [AGTs]). However, although the theoretical proof of the existence of AGTs is alarming, the actual risk that AGTs pose to empirical phylogenetic study is far from clear. Establishing the conditions (i.e., the branch lengths in a species tree) for which AGTs are possible does not address the critical issue of how prevalent they might be. Furthermore, theoretical characterization of the species trees for which AGTs may pose a problem (i.e., the anomaly zone or the species histories for which AGTs are theoretically possible) is based on consideration of just one source of variance that contributes to species tree and gene tree discord-gene lineage coalescence. Yet, empirical data contain another important stochastic component-mutational variance. Estimated gene trees will differ from the underlying gene trees (i.e., the actual genealogy) because of the random process of mutation. Here, we take a simulation approach to investigate the prevalence of AGTs, among estimated gene trees, thereby characterizing the boundaries of the anomaly zone taking into account both coalescent and mutational variances. We also determine the frequency of realized AGTs, which is critical to putting the theoretical work on AGTs into a realistic biological context. Two salient results emerge from this investigation. First, our results show that mutational variance can indeed expand the parameter space (i.e., the relative branch lengths in a species tree) where AGTs might be observed in empirical data. By exploring the underlying cause for the expanded anomaly zone, we identify aspects of empirical data relevant to avoiding the problems that AGTs pose for species tree inference from multilocus data. Second, for the empirical species histories where AGTs are possible, unresolved trees-not AGTs-predominate the pool of estimated gene trees. This result suggests that the risk of AGTs, while they exist in theory, may rarely be realized in practice. By considering the biological realities of both mutational and coalescent variances, the study has refined, and redefined, what the actual challenges are for empirical phylogenetic study of recently diverged taxa that have speciated rapidly-AGTs themselves are unlikely to pose a significant danger to empirical phylogenetic study.

[1]  Alex Wong,et al.  Phylogenetic incongruence in the Drosophila melanogaster species group. , 2007, Molecular phylogenetics and evolution.

[2]  S. Edwards,et al.  Comparison of species tree methods for reconstructing the phylogeny of bearded manakins (Aves: Pipridae, Manacus) from multilocus sequence data. , 2008, Systematic biology.

[3]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[4]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[5]  X. Gu,et al.  Evolutionary Analysis for Functional Divergence of the Toll-Like Receptor Gene Family and Altered Functional Constraints , 2007, Journal of Molecular Evolution.

[6]  J. Drost,et al.  Biological basis of germline mutation: Comparisons of spontaneous germline mutation rates among drosophila, mouse, and human , 1995, Environmental and molecular mutagenesis.

[7]  John E McCormack,et al.  Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. , 2009, Systematic biology.

[8]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[9]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[10]  Scott V Edwards,et al.  SPECIATIONAL HISTORY OF AUSTRALIAN GRASS FINCHES (POEPHILA) INFERRED FROM THIRTY GENE TREES* , 2005, Evolution; international journal of organic evolution.

[11]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[12]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[13]  David Bryant,et al.  Properties of consensus methods for inferring species trees from gene trees. , 2008, Systematic biology.

[14]  Michael J Sanderson,et al.  Phylogenetic Signal in the Eukaryotic Tree of Life , 2008, Science.

[15]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[16]  L. Partridge,et al.  Oxford Surveys in Evolutionary Biology , 1991 .

[17]  C. J-F,et al.  THE COALESCENT , 1980 .

[18]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[19]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[20]  Mark Kirkpatrick,et al.  DO PHYLOGENETIC METHODS PRODUCE TREES WITH BIASED SHAPES? , 1996, Evolution; international journal of organic evolution.

[21]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[22]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[23]  Noah A Rosenberg,et al.  Discordance of species trees with their most likely gene trees: the case of five taxa. , 2008, Systematic biology.

[24]  Jack Sullivan,et al.  Does choice in model selection affect maximum likelihood analysis? , 2008, Systematic biology.

[25]  Bryan C. Carstens,et al.  ESTIMATING A GEOGRAPHICALLY EXPLICIT MODEL OF POPULATION DIVERGENCE , 2007, Evolution; international journal of organic evolution.

[26]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[27]  M. Carling,et al.  Integrating Phylogenetic and Population Genetic Analyses of Multiple Loci to Test Species Divergence Hypotheses in Passerina Buntings , 2008, Genetics.

[28]  Ingo Ebersberger,et al.  Rooted triple consensus and anomalous gene trees , 2008, BMC Evolutionary Biology.

[29]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[30]  N. Takahata Gene genealogy in three related populations: consistency probability between gene and population trees. , 1989, Genetics.

[31]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[32]  L. Lacey Knowles,et al.  Resolving Species Phylogenies of Recent Evolutionary Radiations1 , 2008 .

[33]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[34]  M. Carling,et al.  Gene Sampling Strategies for Multi-Locus Population Estimates of Genetic Diversity (θ) , 2007, PloS one.

[35]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.