Are the Duplication Cost and Robinson-Foulds Distance Equivalent?

In the tree reconciliation approach for species tree inference, a tree that has the minimum reconciliation score for given gene trees is taken as an estimate of the species tree. The scoring models used in existing tree reconciliation methods include the duplication, mutation, and deep coalescence costs. Since existing inference methods all are heuristic, their performances are often evaluated by using the Robinson-Foulds (RF) distance between the true species trees and the estimates output on simulated multi-locus datasets. To better understand these methods, we study the relationships between the duplication cost and the RF distance. We prove that the gap between the duplication cost and the RF distance is unbounded, but the symmetric duplication cost is logarithmically equivalent to the RF distance. The relationships between other reconciliation costs and the RF distance are also investigated.

[1]  Oliver Eulenstein,et al.  Consensus properties for the deep coalescence problem and their application for scalable tree search , 2012, BMC Bioinformatics.

[2]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[3]  Oliver Eulenstein,et al.  The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches , 2009, TCBB.

[4]  Laura Wegener Parfrey,et al.  Turning the crown upside down: gene tree parsimony roots the eukaryotic tree of life. , 2012, Systematic biology.

[5]  Tandy J. Warnow,et al.  Inferring Optimal Species Trees Under Gene Duplication and Loss , 2013, Pacific Symposium on Biocomputing.

[6]  E. Harding The probabilities of rooted tree-shapes generated by random bifurcation , 1971, Advances in Applied Probability.

[7]  D. Bryant Building trees, hunting for trees, and comparing trees : theory and methods in phylogenetic analysis , 1997 .

[8]  Oliver Eulenstein,et al.  Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. , 2011, Systematic biology.

[9]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[10]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[11]  Noah A. Rosenberg,et al.  Mathematical Properties of the Deep Coalescence Cost , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[13]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[14]  Louxin Zhang,et al.  From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  G. Furnas The generation of random, binary unordered trees , 1984 .

[16]  Oliver Eulenstein,et al.  The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI Based Local Searches , 2008, ISBRA.

[17]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[18]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[19]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[20]  Oliver Eulenstein,et al.  DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony , 2008, Bioinform..

[21]  David Fernández-Baca,et al.  Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance , 2012, Algorithms for Molecular Biology.

[22]  W. Maddison Gene Trees in Species Trees , 1997 .

[23]  M. Sanderson,et al.  Inferring angiosperm phylogeny from EST data with widespread gene duplication , 2007, BMC Evolutionary Biology.

[24]  Tandy J. Warnow,et al.  Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles , 2011, J. Comput. Biol..

[25]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[26]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[27]  Kun-Mao Chao,et al.  Linear-Time Algorithms for the Multiple Gene Duplication Problems , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Mike Steel,et al.  Terraces in Phylogenetic Tree Space , 2011, Science.

[29]  Tandy J. Warnow,et al.  Fast and accurate methods for phylogenomic analyses , 2011, BMC Bioinformatics.

[30]  Cedric Chauve,et al.  Models and Algorithms for Genome Evolution , 2013, Computational Biology.

[31]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[32]  Tandy J. Warnow,et al.  Large-Scale Multiple Sequence Alignment and Phylogeny Estimation , 2013, Models and Algorithms for Genome Evolution.

[33]  Oliver Eulenstein,et al.  Exact solutions for species Tree Inference from discordant gene Trees , 2013, J. Bioinform. Comput. Biol..

[34]  Oliver Eulenstein,et al.  Maximizing Deep Coalescence Cost , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.