Efficiently Calculating Evolutionary Tree Measures Using SAT

We develop techniques to calculate important measures in evolutionary biology by encoding to CNF formulas and using powerful SAT solvers. Comparing evolutionary trees is a necessary step in tree reconstruction algorithms, locating recombination and lateral gene transfer, and in analyzing and visualizing sets of trees. We focus on two popular comparison measures for trees: the hybridization number and the rooted subtree-prune-and-regraft (rSPR) distance. Both have recently been shown to be NP-hard, and efficient algorithms are needed to compute and approximate these measures. We encode these as a Boolean formula such that two trees have hybridization number k (or rSPR distance k ) if and only if the corresponding formula is satisfiable. We use state-of-the-art SAT solvers to determine if the formula encoding the measure has a satisfying assignment. Our encoding also provides a rich source of real-world SAT instances, and we include a comparison of several recent solvers (minisat, adaptg2wsat, novelty+p, Walksat, March KS and SATzilla).

[1]  Nicholas Chen,et al.  TreeJuxtaposer : Scalable Tree Comparison using Focus + Context with Guaranteed Visibility , 2006 .

[2]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[3]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[4]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[5]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[6]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[7]  Maria Luisa Bonet,et al.  Approximating Subtree Distances Between Phylogenies , 2006, J. Comput. Biol..

[8]  Yufeng Wu,et al.  A practical method for exact computation of subtree prune and regraft distance , 2009, Bioinform..

[9]  Tandy J. Warnow,et al.  Phylogenetic networks: modeling, reconstructibility, and accuracy , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[11]  Inês Lynce,et al.  Efficient Haplotype Inference with Boolean Satisfiability , 2006, AAAI.

[12]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[13]  Heiko A. Schmidt,et al.  Phylogenetic trees from large datasets , 2003 .

[14]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[15]  Marijn J. H. Heule,et al.  March_dl: Adding Adaptive Heuristics and a New Branching Strategy , 2006, J. Satisf. Boolean Model. Comput..

[16]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[17]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[18]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[19]  Harry Zhang,et al.  Combining Adaptive Noise and Look-Ahead in Local Search for SAT , 2007, SAT.

[20]  Charles Semple,et al.  Hybrids in real time. , 2006, Systematic biology.

[21]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[22]  Simone Linz,et al.  A Reduction Algorithm for Computing The Hybridization Number of Two Trees , 2007, Evolutionary bioinformatics online.

[23]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[24]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[25]  Jerrold I. Davis,et al.  Phylogeny and subfamilial classification of the grasses (Poaceae) , 2001 .

[26]  Holger H. Hoos,et al.  UBCSAT: An Implementation and Experimentation Environment for SLS Algorithms for SAT & MAX-SAT , 2004, SAT.