An Omega(n2/log n) Speed-Up of TBR Heuristics for the Gene-Duplication Problem

The gene-duplication problem is to infer a species supertree from gene trees that are confounded by complex histories of gene duplications. This problem is NP-hard and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. We improve on the time complexity of the local search problem by a factor of n2/ log n, where n is the size of the resulting species supertree. Typically, several thousand instances of the local search problem are solved throughout a stepwise heuristic search. Hence, our improvement makes the gene-duplication problem much more tractable for large-scale phylogenetic analyses.

[1]  Ulrike Stege,et al.  Gene Trees and Species Trees: The Gene-Duplication Problem in Fixed-Parameter Tractable , 1999, WADS.

[2]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[3]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[4]  M. Sanderson,et al.  Inferring angiosperm phylogeny from EST data with widespread gene duplication , 2007, BMC Evolutionary Biology.

[5]  Roderic D. M. Page,et al.  Tangled Tales from Multiple Markers , 2004 .

[6]  Oliver Eulenstein,et al.  Heuristics for the Gene-Duplication Problem: A Theta ( n ) Speed-Up for the Local Search , 2007, RECOMB.

[7]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[8]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[9]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[10]  Paola Bonizzoni,et al.  Reconciling a gene tree to a species tree under the duplication cost model , 2005, Theor. Comput. Sci..

[11]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[12]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[13]  Paola Bonizzoni,et al.  Reconciling Gene Trees to a Species Tree , 2003, CIAC.

[14]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[15]  David Fernández-Baca,et al.  Improved Heuristics for Minimum-Flip Supertree Construction , 2006, Evolutionary bioinformatics online.

[16]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[17]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[18]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[19]  Michael R. Fellows,et al.  Analogs & duals of the MAST problem for sequences & trees , 2003, J. Algorithms.

[20]  Bin Ma,et al.  On reconstructing species trees from gene trees in term of duplications and losses , 1998, RECOMB '98.

[21]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[22]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[23]  R. Page Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. , 2000, Molecular phylogenetics and evolution.

[24]  Roderic D. M. Page,et al.  Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications , 2001, Pacific Symposium on Biocomputing.

[25]  Jerzy Tiuryn,et al.  On the Structure of Reconciliations , 2004, Comparative Genomics.

[26]  Michael R. Fellows,et al.  Analogs and Duals of the MAST Problem for Sequences and Trees , 1998, ESA.

[27]  A. Knight,et al.  Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. , 1997, Molecular phylogenetics and evolution.