Inferring and validating horizontal gene transfer events using bipartition dissimilarity.

Horizontal gene transfer (HGT) is one of the main mechanisms driving the evolution of microorganisms. Its accurate identification is one of the major challenges posed by reticulate evolution. In this article, we describe a new polynomial-time algorithm for inferring HGT events and compare 3 existing and 1 new tree comparison indices in the context of HGT identification. The proposed algorithm can rely on different optimization criteria, including least squares (LS), Robinson and Foulds (RF) distance, quartet distance (QD), and bipartition dissimilarity (BD), when searching for an optimal scenario of subtree prune and regraft (SPR) moves needed to transform the given species tree into the given gene tree. As the simulation results suggest, the algorithmic strategy based on BD, introduced in this article, generally provides better results than those based on LS, RF, and QD. The BD-based algorithm also proved to be more accurate and faster than a well-known polynomial time heuristic RIATA-HGT. Moreover, the HGT recovery results yielded by BD were generally equivalent to those provided by the exponential-time algorithm LatTrans, but a clear gain in running time was obtained using the new algorithm. Finally, a statistical framework for assessing the reliability of obtained HGTs by bootstrap analysis is also presented.

[1]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[2]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[3]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[4]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[5]  H. Philippe,et al.  MUST, a computer package of Management Utilities for Sequences and Trees. , 1993, Nucleic acids research.

[6]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[7]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[8]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[9]  D. Maddison,et al.  The Tree of Life Web Project , 2007 .

[10]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[11]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[12]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[13]  H. Ochman,et al.  Amelioration of Bacterial Genomes: Rates of Change and Exchange , 1997, Journal of Molecular Evolution.

[14]  W. Maddison Gene Trees in Species Trees , 1997 .

[15]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[16]  R. Page,et al.  Trees within trees: phylogeny and historical associations. , 1998, Trends in ecology & evolution.

[17]  M. Charleston,et al.  Jungles: a new solution to the host/parasite phylogeny reconciliation problem. , 1998, Mathematical biosciences.

[18]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[19]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[20]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[21]  Gary J. Olsen,et al.  Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process , 2000, Microbiology and Molecular Biology Reviews.

[22]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[23]  K. Crandall,et al.  Selecting the best-fit model of nucleotide substitution. , 2001, Systematic biology.

[24]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[25]  Vladimir Makarenkov,et al.  T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks , 2001, Bioinform..

[26]  O. Gascuel,et al.  Efficient biased estimation of evolutionary distances when substitution rates vary across sites. , 2002, Molecular biology and evolution.

[27]  Hervé Philippe,et al.  Archaeal phylogeny based on ribosomal proteins. , 2002, Molecular biology and evolution.

[28]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[29]  W. Doolittle,et al.  Prokaryotic evolution in light of gene transfer. , 2002, Molecular biology and evolution.

[30]  Eugene V Koonin,et al.  Horizontal gene transfer: the path to maturity , 2003, Molecular microbiology.

[31]  Michael T. Hallett,et al.  Towards Identifying Lateral Gene Transfer Events , 2002, Pacific Symposium on Biocomputing.

[32]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[33]  Vladimir Makarenkov,et al.  New Efficient Algorithm for Detection of Horizontal Gene Transfer Events , 2003, WABI.

[34]  W. Doolittle,et al.  How big is the iceberg of which organellar genes in nuclear genomes are but the tip? , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[35]  Michael T. Hallett,et al.  Simultaneous identification of duplications and lateral transfers , 2004, RECOMB.

[36]  Eric Bapteste,et al.  Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement , 2005, BMC Evolutionary Biology.

[37]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[38]  S. Acinas,et al.  Divergence and Redundancy of 16S rRNA Sequences in Genomes with Multiple rrn Operons , 2004, Journal of bacteriology.

[39]  Tandy J. Warnow,et al.  Phylogenetic networks: modeling, reconstructibility, and accuracy , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Arndt von Haeseler,et al.  Network models for sequence evolution , 1993, Journal of Molecular Evolution.

[41]  Olga Zhaxybayeva,et al.  Genome mosaicism and organismal lineages. , 2004, Trends in genetics : TIG.

[42]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[43]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[44]  Aristotelis Tsirigos,et al.  A new computational method for the detection of horizontal gene transfer events , 2005, Nucleic acids research.

[45]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[46]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[47]  Vladimir Batagelj,et al.  Data Science and Classification , 2006, Studies in Classification, Data Analysis, and Knowledge Organization.

[48]  István Miklós,et al.  A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer , 2005, RECOMB.

[49]  Vladimir Makarenkov,et al.  New Efficient Algorithm for Modeling Partial and Complete Gene Transfer Scenarios , 2006, Data Science and Classification.

[50]  Sagi Snir,et al.  Efficient parsimony-based methods for phylogenetic network reconstruction , 2007, Bioinform..

[51]  T. Tuller,et al.  Inferring phylogenetic networks by the maximum parsimony criterion: a case study. , 2006, Molecular biology and evolution.

[52]  Luay Nakhleh,et al.  Confounding Factors in HGT Detection: Statistical Error, Coalescent Effects, and Multiple Solutions , 2007, J. Comput. Biol..

[53]  Stéphane Vialette,et al.  Comparative Genomics, International Workshop, RECOMB-CG 2008, Paris, France, October 13-15, 2008. Proceedings , 2008, RECOMB-CG.

[54]  Glenn Hickey,et al.  SPR Distance Computation for Unrooted Trees , 2008, Evolutionary bioinformatics online.

[55]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[56]  Luay Nakhleh,et al.  SPR-based Tree Reconciliation: Non-binary Trees and Multiple Solutions , 2008, APBC.

[57]  Luay Nakhleh,et al.  Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer , 2008, RECOMB-CG.

[58]  O. Gascuel,et al.  Consistency of Topological Moves Based on the Balanced Minimum Evolution Principle of Phylogenetic Inference , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.