The complexity of reconstructing trees from qualitative characters and subtrees

In taxonomy and other branches of classification it is useful to know when tree-like classifications on overlapping sets of labels can be consistently combined into a parent tree. This paper considers the computation complexity of this problem. Recognizing when a consistent parent tree exists is shown to be intractable (NP-complete) for sets of unrooted trees, even when each tree in the set classifies just four labels. Consequently determining the compatibility of qualitative characters and partial binary characters is, in general, also NP-complete. However for sets of rooted trees an algorithm is described which constructs the “strict consensus tree” of all consistent parent trees (when they exist) in polynomial time. The related question of recognizing when a set of subtrees uniquely defines a parent tree is also considered, and a simple necessary and sufficient condition is described for rooted trees.

[1]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[2]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[3]  F. Gavril The intersection graphs of subtrees in tree are exactly the chordal graphs , 1974 .

[4]  Peter Buneman,et al.  A characterisation of rigid circuit graphs , 1974, Discret. Math..

[5]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[6]  G. F. Estabrook,et al.  An algebraic analysis of cladistic characters , 1976, Discret. Math..

[7]  F. McMorris On the compatibility of binary qualitative taxonomic characters. , 1977, Bulletin of mathematical biology.

[8]  F. McMorris,et al.  When are two qualitative taxonomic characters compatible? , 1977, Journal of mathematical biology.

[9]  Temple F. Smith,et al.  On the similarity of dendrograms. , 1978, Journal of theoretical biology.

[10]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[11]  George F. Estabrook,et al.  How to determine the compatibility of undirected character state trees , 1979 .

[12]  M. Golumbic Algorithmic graph theory and perfect graphs , 1980 .

[13]  F. James Rohlf,et al.  Taxonomic Congruence in the Leptopodomorpha Re-examined , 1981 .

[14]  M. Yannakakis Computing the Minimum Fill-in is NP^Complete , 1981 .

[15]  Alfred V. Aho,et al.  Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions , 1981, SIAM J. Comput..

[16]  H. Colonius,et al.  Tree structures for proximity data , 1981 .

[17]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[18]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[19]  Christopher A. Meacham,et al.  Theoretical and Computational Considerations of the Compatibility of Qualitative Taxonomic Characters , 1983 .

[20]  Fred R. McMorris,et al.  COMPARISON OF UNDIRECTED PHYLOGENETIC TREES BASED ON SUBTREES OF FOUR EVOLUTIONARY UNITS , 1985 .

[21]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[22]  A. D. Gordon Consensus supertrees: The synthesis of rooted trees containing overlapping sets of labeled leaves , 1986 .

[23]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[24]  David Sankoff,et al.  Tree enumeration modulo a consensus , 1986 .

[25]  William H. E. Day,et al.  Analysis of Quartet Dissimilarity Measures Between Undirected Phylogenetic Trees , 1986 .

[26]  A. Dress,et al.  Reconstructing the shape of a tree from observed dissimilarity data , 1986 .

[27]  C. Meacham,et al.  The necessity of convex groups in biological classification , 1987 .

[28]  J. Felsenstein,et al.  Invariants of phylogenies in a simple case with discrete states , 1987 .

[29]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[30]  László A. Székely,et al.  Applications of antilexicographic order. I. An enumerative theory of trees , 1989 .

[31]  M. Hendy The Relationship Between Simple Evolutionary Tree Models and Observable Sequence Data , 1989 .

[32]  David Sankoff,et al.  Quadratic tree invariants for multivalued characters , 1990 .

[33]  G. Brossier Piecewise hierarchical clustering , 1990 .

[34]  Sampath Kannan,et al.  Inferring evolutionary history from DNA sequences , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[35]  Nicholas C. Wormald,et al.  On the Distribution of Lengths of Evolutionary Trees , 1990, SIAM J. Discret. Math..

[36]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[37]  Mike Steel,et al.  Convex tree realizations of partitions , 1992 .

[38]  T. Warnow Combinatorial algorithms for constructing phylogenetic trees , 1992 .