The Deep Coalescence Consensus Tree Problem is Pareto on Clusters

Phylogenetic methods must account for the biological processes that create incongruence between gene trees and the species phylogeny. Deep coalescence, or incomplete lineage sorting creates discord among gene trees at the early stages of species divergence or in cases when the time between speciation events was short and the ancestral population sizes were large. The deep coalescence problem takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events, or the smallest deep coalescence reconciliation cost. Although this approach can to be useful for phylogenetics, the consensus properties of this problem are largely uncharacterized, and the accuracy of heuristics is untested. We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. We introduce an efficient algorithm that, given a candidate species tree that does not display the consensus clusters, will modify the candidate tree so that it includes all of the clusters and has a lower (more optimal) deep coalescence cost. Simulation experiments demonstrate the efficacy of this algorithm, but they also indicate that even with large trees, most solutions returned by the recent efficient heuristic display the consensus clusters.

[1]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[2]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[3]  W. Maddison Gene Trees in Species Trees , 1997 .

[4]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[5]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[6]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[7]  Louxin Zhang From gene trees to species trees II: Species tree inference in the deep coalescence model , 2010, 1003.1204.

[8]  A. Knight,et al.  Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. , 1997, Molecular phylogenetics and evolution.

[9]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[10]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[11]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[12]  L Lacey Knowles,et al.  Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. , 2009, Systematic biology.

[13]  David Bryant,et al.  A classification of consensus methods for phylogenetics , 2001, Bioconsensus.

[14]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[15]  François-Joseph Lapointe,et al.  Properties of supertree methods in the consensus setting. , 2007, Systematic biology.

[16]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[17]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[18]  Noah A. Rosenberg,et al.  Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences , 2011, J. Comput. Biol..

[19]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[20]  Russell Schwartz,et al.  Applying unmixing to gene expression data for tumor phylogeny inference , 2010, BMC Bioinformatics.

[21]  James O. McInerney,et al.  Some Desiderata for Liberal Supertrees , 2004 .

[22]  Louxin Zhang,et al.  From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Oliver Eulenstein,et al.  Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models , 2010, BMC Bioinformatics.

[24]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .