Faster computation of the Robinson-Foulds distance between phylogenetic networks

The Robinson-Foulds distance, which is the most widely used metric for comparing phylogenetic trees, has recently been generalized to phylogenetic networks. Given two networks N1,N2 with n leaves, m nodes, and e edges, the Robinson-Foulds distance measures the number of clusters of descendant leaves that are not shared by N1 and N2. The fastest known algorithm for computing the Robinson-Foulds distance between those networks runs in O(m(m + e)) time. In this paper, we improve the time complexity to O(n(m+ e)/ log n) for general networks and O(nm/log n) for general networks with bounded degree, and to optimal O(m + e) time for planar phylogenetic networks and bounded-level phylogenetic networks. We also introduce the natural concept of the minimum spread of a phylogenetic network and show how the running time of our new algorithm depends on this parameter. As an example, we prove that the minimum spread of a level-k phylogenetic network is at most k + 1, which implies that for two level-k phylogenetic networks, our algorithm runs in O((k + 1)(m + e)) time.

[1]  Kristoffer Forslund,et al.  QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets. , 2006, Molecular biology and evolution.

[2]  V Moulton,et al.  Likelihood analysis of phylogenetic networks using directed graphical models. , 2000, Molecular biology and evolution.

[3]  Bernard M. E. Moret,et al.  Efficiently Computing the Robinson-Foulds Metric , 2007, J. Comput. Biol..

[4]  Dan Gusfield,et al.  Efficient reconstruction of phylogenetic networks with constrained recombination , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[5]  Dan Gusfield,et al.  Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination , 2004, J. Bioinform. Comput. Biol..

[6]  L. Stougie,et al.  Constructing Level-2 Phylogenetic Networks from Triplets , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Sagi Snir,et al.  Efficient parsimony-based methods for phylogenetic network reconstruction , 2007, Bioinform..

[8]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[9]  Seung-Jin Sul,et al.  An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms , 2008, ESA.

[10]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[11]  Barth F. Smets,et al.  Horizontal gene transfer: perspectives at a crossroads of scientific disciplines , 2005, Nature Reviews Microbiology.

[12]  Gabriel Cardona,et al.  Comparison of Tree-Child Phylogenetic Networks , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Francesc Rosselló,et al.  All that Glisters is not Galled , 2009, Mathematical biosciences.

[14]  Gabriel Valiente Efficient Algorithms on Trees and Graphs with Unique Node Labels , 2007, Applied Graph Theory in Computer Vision and Pattern Recognition.

[15]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[16]  Sagi Snir,et al.  Maximum likelihood of phylogenetic networks , 2006, Bioinform..

[17]  Bin Ma,et al.  Fixed topology alignment with recombination , 1998, Discrete Applied Mathematics.

[18]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[20]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[21]  K. Crandall,et al.  Intraspecific gene genealogies: trees grafting into networks. , 2001, Trends in ecology & evolution.

[22]  Norishige Chiba,et al.  A Linear Algorithm for Embedding Planar Graphs Using PQ-Trees , 1985, J. Comput. Syst. Sci..

[23]  Gabriel Cardona,et al.  A distance metric for a class of tree-sibling phylogenetic networks , 2008, Bioinform..

[24]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[25]  Leo van Iersel,et al.  Uniqueness, Intractability and Exact Algorithms: Reflections on Level-k Phylogenetic Networks , 2007, J. Bioinform. Comput. Biol..

[26]  Leo van Iersel,et al.  Constructing Level-2 Phylogenetic Networks from Triplets , 2009, IEEE ACM Trans. Comput. Biol. Bioinform..

[27]  Stefan Grünewald,et al.  Consistency of the QNet algorithm for generating planar split networks from weighted quartets , 2009, Discret. Appl. Math..

[28]  Wing-Kin Sung,et al.  Algorithms for combining rooted triplets into a galled phylogenetic network , 2005, SODA '05.

[29]  V Moulton,et al.  Recombination analysis using directed graphical models. , 2001, Molecular biology and evolution.

[30]  Dan Gusfield,et al.  The Fine Structure of Galls in Phylogenetic Networks , 2004, INFORMS J. Comput..

[31]  Kunihiko Sadakane,et al.  Computing the Maximum Agreement of Phylogenetic Networks , 2004, CATS.

[32]  Charles Semple,et al.  Hybrids in real time. , 2006, Systematic biology.

[33]  Seung-Jin Sul,et al.  Efficiently Computing Arbitrarily-Sized Robinson-Foulds Distance Matrices , 2008, WABI.

[34]  Takao Nishizeki,et al.  Planar Graphs: Theory and Algorithms , 1988 .

[35]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[36]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[37]  Gabriel Valiente Combinatorial Pattern Matching Algorithms in Computational Biology Using Perl and R , 2009, Chapman and Hall / CRC mathematical and computational biology series.

[38]  Wing-Kin Sung,et al.  Fast algorithms for computing the tripartition-based distance between phylogenetic networks , 2005, J. Comb. Optim..

[39]  Tandy J. Warnow,et al.  Phylogenetic networks: modeling, reconstructibility, and accuracy , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  G. Valiente,et al.  Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.