A new efficient algorithm for inferring explicit hybridization networks following the Neighbor-Joining principle

Several algorithms and software have been developed for inferring phylogenetic trees. However, there exist some biological phenomena such as hybridization, recombination, or horizontal gene transfer which cannot be represented by a tree topology. We need to use phylogenetic networks to adequately represent these important evolutionary mechanisms. In this article, we present a new efficient heuristic algorithm for inferring hybridization networks from evolutionary distance matrices between species. The famous Neighbor-Joining concept and the least-squares criterion are used for building networks. At each step of the algorithm, before joining two given nodes, we check if a hybridization event could be related to one of them or to both of them. The proposed algorithm finds the exact tree solution when the considered distance matrix is a tree metric (i.e. it is representable by a unique phylogenetic tree). It also provides very good hybrids recovery rates for large trees (with 32 and 64 leaves in our simulations) for both distance and sequence types of data. The results yielded by the new algorithm for real and simulated datasets are illustrated and discussed in detail.

[1]  Leo van Iersel,et al.  Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters , 2009, Bioinform..

[2]  Vladimir Makarenkov,et al.  New Efficient Algorithm for Modeling Partial and Complete Gene Transfer Scenarios , 2006, Data Science and Classification.

[3]  Vincent Berry,et al.  An Efficient Algorithm for Gene/Species Trees Parsimonious Reconciliation with Losses, Duplications and Transfers , 2010, RECOMB-CG.

[4]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[5]  Daniel H. Huson,et al.  Beyond Galled Trees - Decomposition and Computation of Galled Networks , 2007, RECOMB.

[6]  Leo van Iersel,et al.  Constructing the Simplest Possible Phylogenetic Network from Triplets , 2008, Algorithmica.

[7]  Vladimir Makarenkov,et al.  Improving the Additive Tree Representation of a Dissimilarity Matrix Using Reticulations , 2000 .

[8]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[9]  R. Dawley,et al.  An introduction to unisexual vertebrates , 1989 .

[10]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[11]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[12]  Vladimir Makarenkov,et al.  Towards an accurate identification of mosaic genes and partial horizontal gene transfers , 2011, Nucleic acids research.

[13]  A. Dress,et al.  A canonical decomposition theory for metrics on a finite set , 1992 .

[14]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[15]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[16]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[17]  V. Makarenkov,et al.  Inferring and validating horizontal gene transfer events using bipartition dissimilarity. , 2010, Systematic biology.

[18]  Kristoffer Forslund,et al.  QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets. , 2006, Molecular biology and evolution.

[19]  Vladimir Makarenkov,et al.  T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks , 2012, Nucleic Acids Res..

[20]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[21]  Daniel H. Huson,et al.  Summarizing Multiple Gene Trees Using Cluster Networks , 2008, WABI.

[22]  Clive A. Stace,et al.  Plant Taxonomy and Biosystematics. , 1982 .

[23]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[24]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[25]  A. Dress,et al.  Split decomposition: a new and useful approach to phylogenetic analysis of distance data. , 1992, Molecular phylogenetics and evolution.

[26]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[27]  Alain Guénoche,et al.  Trees and proximity representations (book review) , 1992 .

[28]  D. Huson,et al.  Improved Layout of Phylogenetic Networks , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[30]  S. Sawyer Statistical tests for detecting gene conversion. , 1989, Molecular biology and evolution.

[31]  Zhi-Zhong Chen,et al.  A fast tool for minimum hybridization networks , 2012, BMC Bioinformatics.

[32]  Yufeng Wu,et al.  Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees , 2010, Bioinform..

[33]  Zhi-Zhong Chen,et al.  Algorithms for Reticulate Networks of Multiple Phylogenetic Trees , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Vladimir Makarenkov,et al.  Comparison of Additive Trees Using Circular Orders , 2000, J. Comput. Biol..

[35]  Olivier Gascuel,et al.  Reconstructing evolution : new mathematical and computational advances , 2007 .

[36]  Daniel H. Huson,et al.  Computing recombination networks from binary sequences , 2005, ECCB/JBI.

[37]  W. Black,et al.  An estimate of phylogenetic relationships among culicine mosquitoes using a restriction map of the rDNA cistron , 1998, Insect molecular biology.

[38]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[39]  Vladimir Makarenkov,et al.  T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks , 2001, Bioinform..

[40]  Vladimir Makarenkov,et al.  On some relations between 2-trees and tree metrics , 1998, Discret. Math..

[41]  Andrew G. Stephenson,et al.  Experimental and Molecular Approaches to Plant Biosystematics , 1997 .

[42]  K. Crandall,et al.  A Comparison of Phylogenetic Network Methods Using Computer Simulation , 2008, PloS one.

[43]  H. Bandelt,et al.  Median-joining networks for inferring intraspecific phylogenies. , 1999, Molecular biology and evolution.

[44]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[45]  Elizabeth A. Kellogg,et al.  Plant Systematics: A Phylogenetic Approach , 2000 .

[46]  Daniel H. Huson,et al.  Fast computation of minimum hybridization networks , 2012, Bioinform..

[47]  Xiu-Feng Wan,et al.  Quartet-net: a quartet-based method to reconstruct phylogenetic networks. , 2013, Molecular biology and evolution.

[48]  Stefan Grünewald,et al.  Quartet-based methods to reconstruct phylogenetic networks , 2014, BMC Systems Biology.

[49]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[50]  P. H. A. Sneath,et al.  Detecting Evolutionary Incompatibilities From Protein Sequences , 1975 .

[51]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[52]  Vladimir Makarenkov,et al.  From a Phylogenetic Tree to a Reticulated Network , 2004, J. Comput. Biol..

[53]  J. Stephens,et al.  Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion. , 1985, Molecular biology and evolution.

[54]  Vladimir Makarenkov,et al.  Reconstruction of biogeographic and evolutionary networks using reticulograms. , 2002, Systematic biology.