Convex Recolorings of Strings and Trees: Definitions, Hardness Results and Algorithms

A coloring of a tree is convex if the vertices that pertain to any color induce a connected subtree. Convex colorings of trees arise in areas such as phylogenetics, linguistics, etc. e.g., a perfect phylogenetic tree is one in which the states of each character induce a convex coloring of the tree. When a coloring of a tree is not convex, it is desirable to know ”how far” it is from a convex one, and what are the convex colorings which are ”closest” to it. In this paper we study a natural definition of this distance – the recoloring distance, which is the minimal number of color changes at the vertices needed to make the coloring convex. We show that finding this distance is NP-hard even for a path, and for some other interesting variants of the problem. In the positive side, we present algorithms for computing the recoloring distance under some natural generalizations of this concept: the uniform weighted model and the non-uniform model. Our first algorithms find optimal convex recolorings of strings and bounded degree trees under the non-uniform model in linear time for any fixed number of colors. Next we improve these algorithms for the uniform model to run in linear time for any fixed number of bad colors. Finally, we generalize the above result to hold for trees of unbounded degree.

[1]  Sampath Kannan,et al.  Inferring evolutionary history from DNA sequences , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[2]  Michael R. Fellows,et al.  Two Strikes Against Perfect Phylogeny , 1992, ICALP.

[3]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[4]  Nir Friedman,et al.  Class discovery in gene expression data , 2001, RECOMB.

[5]  David Fernández-Baca,et al.  Supertrees by Flipping , 2002, COCOON.

[6]  David Fernández-Baca,et al.  Simple Algorithms for Perfect Phylogeny and Triangulating Colored Graphs , 1996, Int. J. Found. Comput. Sci..

[7]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[8]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[9]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[10]  Marcus W Feldman,et al.  Stable association between strains of Mycobacterium tuberculosis and their human host populations. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Sagi Snir,et al.  Convex Recolorings of Strings and Trees , 2003 .

[12]  Tao Jiang,et al.  Available online at www.sciencedirect.com , 2000 .

[13]  Cynthia A. Phillips,et al.  Minimizing Phylogenetic Number To Find Good Evolutionary Trees , 1995, Discret. Appl. Math..

[14]  David Fernández-Baca,et al.  A Polynomial-Time Algorithm for Near-Perfect Phylogeny , 1996, SIAM J. Comput..

[15]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[16]  Sampath Kannan,et al.  A fast algorithm for the computation and enumeration of perfect phylogenies when the number of character states is fixed , 1995, SODA '95.

[17]  Mike Steel,et al.  Convex tree realizations of partitions , 1992 .

[18]  Walter M. Fitch,et al.  A non-sequential method for constructing trees and hierarchical classifications , 2005, Journal of Molecular Evolution.