Combinatorial algorithms for constructing phylogenetic trees
暂无分享,去创建一个
Phylogenetic trees are rooted trees that model the evolution of a set S of biological species from a common ancestor, where the leaves represent the species in S and the internal nodes represent ancestral species. A major endeavor in numerical taxonomy is the computation of phylogenetic trees from data on the species set, described typically in one of two ways: distance matrices, and qualitative characters.
Distance matrices represent distances between pairs of species, in some suitably defined metric space. Computing trees given distance matrices is a well understood problem. We present a new model of computing phylogenetic trees, based upon experiments, that generalizes the distance matrix model. In our model, we assume that for any three species, an experiment can be performed that determines the true phylogeny for those three species. We analyze the complexity of determining phylogenetic trees in this model, and present tight upper and lower bounds.
Characters are equivalence relations on the species set, S, partitioning S into the distinct character states. Within this model, an optimal phylogenetic tree (satisfying certain constraints) has been defined, and is called a perfect phylogeny. The problem of determining whether a perfect phylogeny exists for a given set of species defined by characters is called the Perfect Phylogeny Problem, or the Character Compatibility Problem. Although this problem has been widely discussed in the biomathematical literature for decades, the only cases for which polynomial time solutions to the Perfect Phylogeny Problem were known previously were the cases of two characters, and binary (i.e. 2-state) characters.
In 1974, Buneman showed that the Perfect Phylogeny Problem reduced in polynomial time to a graph-theoretic problem, called the Triangulating Colored Graphs Problem. We show that these two problems are NP-Complete, and present polynomial time algorithms for several special cases of each problem. In particular, we present an $O(n\sp2 k)$ algorithm to construct a perfect phylogeny from n aligned DNA sequences of length k.