On contract-and-refine transformations between phylogenetic trees

The inference of evolutionary trees using approaches which attempt to solve the maximum parsimony (MP) and maximum likelihood (ML) optimization problems is a standard part of much of biological data analysis. However, both problems are hard to solve: MP provably NP-hard, and ML even harder in practice. Consequently, hill-climbing heuristics are used to analyze datasets for phylogeny reconstruction. Two primary topological transformations have been used in the most popular heuristics: TBR (tree-bisection-and-reconnection) and ECR (edge-contractions-and-refinements). While most of the popular heuristics exclusively use TBR moves to explore tree space, some recent methods have used ECR in conjunction with TBR and found significant improvements in the speed and accuracy with which they can analyze datasets. In this paper we analyze ECR moves in detail, and provide results on the diameter of the tree space, the neighborhood intersection with TBR, structural analysis of the ECR operation, and an efficient method for sampling uniformly from the 2-ECR neighborhood of a tree. Our results should lead to a better understanding of the impact of ECR moves on the performance of heuristic searches.

[1]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[2]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[3]  D. Maddison The discovery and importance of multiple islands of most , 1991 .

[4]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[5]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[6]  F. Ayala Molecular systematics , 2004, Journal of Molecular Evolution.

[7]  Derick Wood,et al.  A Note on Some Tree Similarity Measures , 1982, Inf. Process. Lett..

[8]  P. Hall On Representatives of Subsets , 1935 .

[9]  Junhyong Kim,et al.  Running Head : Heuristic Phylogenetic Estimation FROM ROLLING HILLS TO JAGGED MOUNTAINS : SCALING OF HEURISTIC SEARCHES FOR PHYLOGENETIC ESTIMATION , 2000 .

[10]  Xin He,et al.  On Distances between Phylogenetic Trees (Extended Abstract) , 1997, ACM-SIAM Symposium on Discrete Algorithms.

[11]  J. Tromp,et al.  On the nearest neighbour interchange distance between evolutionary trees. , 1996, Journal of theoretical biology.

[12]  Y. Abel,et al.  Journal of Classification 11:209-232 (1994) A Tree 9 A Window 9 A Hill; Generalization of Nearest- Neighbor Interchange in Phylogenetic Optimization , 2005 .

[13]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[14]  D. Robinson Comparison of labeled trees with valency three , 1971 .

[15]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[16]  Junhyong Kim,et al.  FROM ROLLING HILLS TO JAGGED MOUNTAINS: SCALING OF HEURISTIC SEARCHES , 2000 .

[17]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[18]  David Sankoff,et al.  A tree · a window · a hill; generalization of nearest-neighbor interchange in phylogenetic optimization , 1994 .

[19]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[20]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[21]  Cynthia A. Phillips,et al.  The Asymmetric Median Tree - A New Model for Building Consensus Trees , 1996, Discret. Appl. Math..

[22]  Tandy J. Warnow,et al.  Better Hill-Climbing Searches for Parsimony , 2003, WABI.

[23]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[24]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .