Bacterial Phylogenetic Reconstruction from Whole Genomes Is Robust to Recombination but Demographic Inference Is Not

ABSTRACT Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. IMPORTANCE Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem. Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem.

[1]  D. Falush,et al.  Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences , 2010, Genetics.

[2]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[3]  Paul Keim,et al.  Rapid Typing of Coxiella burnetii , 2011, PloS one.

[4]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[5]  Daniel J. Wilson,et al.  Insights from Genomics into Bacterial Pathogen Populations , 2012, PLoS pathogens.

[6]  A. Stoltzfus,et al.  Molecular evolution of the Escherichia coli chromosome. II. Clonal segments. , 1988, Genetics.

[7]  R. Sokal,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification. , 1975 .

[8]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[9]  J. M. Smith,et al.  How clonal are bacteria? , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[11]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.

[12]  X. Didelot,et al.  A comparison of homologous recombination rates in bacteria and archaea , 2009, The ISME Journal.

[13]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[14]  B. Rannala,et al.  Phylogenetic inference using whole genomes. , 2008, Annual review of genomics and human genetics.

[15]  James I Mullins,et al.  Potential impact of recombination on sitewise approaches for detecting positive natural selection. , 2003, Genetical research.

[16]  J. M. Smith,et al.  Detecting recombination from gene trees. , 1998, Molecular biology and evolution.

[17]  Daniel J. Wilson,et al.  Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus , 2014, Nature Communications.

[18]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[19]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[20]  Marc A Suchard,et al.  Unifying vertical and nonvertical evolution: a stochastic ARG-based framework. , 2010, Systematic biology.

[21]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[23]  Stefan Grünewald,et al.  Noisy: Identification of problematic columns in multiple sequence alignments , 2008, Algorithms for Molecular Biology.

[24]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[25]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[26]  Daniel Falush,et al.  SimMLST: simulation of multi-locus sequence typing data under a neutral model , 2009, Bioinform..

[27]  K. Crandall,et al.  The Effect of Recombination on the Accuracy of Phylogeny Estimation , 2002, Journal of Molecular Evolution.

[28]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[29]  Jon A Yamato,et al.  Maximum likelihood estimation of recombination rates from population data. , 2000, Genetics.

[30]  Daniel J. Wilson,et al.  The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. , 2005, Molecular biology and evolution.

[31]  M. Pérez‐Losada,et al.  Population genetics of microbial pathogens estimated from multilocus sequence typing (MLST) data. , 2006, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[32]  R. Nielsen,et al.  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. , 2003, Genetics.

[33]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[34]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[35]  S. Jeffery Evolution of Protein Molecules , 1979 .

[36]  Michael P. Cummings,et al.  BEAST (Bayesian Evolutionary Analysis by Sampling Trees) , 2014 .

[37]  Giovanna Morelli,et al.  Phylogenetic diversity and historical patterns of pandemic spread of Yersinia pestis , 2010, Nature Genetics.

[38]  D. Falush,et al.  Helicobacter pylori genome evolution during human infection , 2011, Proceedings of the National Academy of Sciences.

[39]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[40]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[41]  R Milkman,et al.  Molecular evolution of the Escherichia coli chromosome. III. Clonal frames. , 1990, Genetics.

[42]  Nigel French,et al.  Analysis of Recombination in Campylobacterjejuni from MLST Population Data , 2005, Journal of Molecular Evolution.