The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches

The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene-duplication events. This problem is NP-complete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the NNI search problem, which is based on the nearest neighbor interchange operation. In this work, we 1) provide a novel near-linear time algorithm for the NNI search problem, 2) introduce extensions that significantly enlarge the search space of the NNI search problem, and 3) present algorithms for these extended versions that are asymptotically just as efficient as our algorithm for the NNI search problem. The exceptional speedup achieved in the extended NNI search problems makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the performance of our algorithms in a comparison study using sets of large randomly generated gene trees.

[1]  A. Knight,et al.  Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. , 1997, Molecular phylogenetics and evolution.

[2]  Jerzy Tiuryn,et al.  On the Structure of Reconciliations , 2004, Comparative Genomics.

[3]  M. Sanderson,et al.  Inferring angiosperm phylogeny from EST data with widespread gene duplication , 2007, BMC Evolutionary Biology.

[4]  Oliver Eulenstein,et al.  An Omega(n^2/ log n) Speed-Up of TBR Heuristics for the Gene-Duplication Problem , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[5]  Avi Pfeffer,et al.  Automatic genome-wide reconstruction of phylogenetic gene trees , 2007, ISMB/ECCB.

[6]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[7]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[8]  David Fernández-Baca,et al.  Improved Heuristics for Minimum-Flip Supertree Construction , 2006, Evolutionary bioinformatics online.

[9]  Roderic D. M. Page,et al.  Tangled Tales from Multiple Markers , 2004 .

[10]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[11]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[12]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[13]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[14]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[15]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[16]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[17]  R. Page Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. , 2000, Molecular phylogenetics and evolution.

[18]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.

[19]  Gene trees and species trees the gene duplication problem is fixed-parameter , .

[20]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[21]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[22]  Paola Bonizzoni,et al.  Reconciling a gene tree to a species tree under the duplication cost model , 2005, Theor. Comput. Sci..

[23]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[24]  Oliver Eulenstein,et al.  DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony , 2008, Bioinform..

[25]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[26]  Oliver Eulenstein,et al.  The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI Based Local Searches , 2008, ISBRA.

[27]  Tandy J. Warnow,et al.  Better Hill-Climbing Searches for Parsimony , 2003, WABI.

[28]  Tandy J. Warnow,et al.  On contract-and-refine transformations between phylogenetic trees , 2004, SODA '04.

[29]  M. Tristem Molecular Evolution — A Phylogenetic Approach. , 2000, Heredity.

[30]  Roderic D. M. Page,et al.  Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications , 2001, Pacific Symposium on Biocomputing.

[31]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[32]  B. Dasgupta,et al.  On distances between phylogenetic trees , 1997, SODA '97.

[33]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[34]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[35]  Michael R. Fellows,et al.  Analogs & duals of the MAST problem for sequences & trees , 2003, J. Algorithms.

[36]  J. G. Burleigh,et al.  Heuristics for the Gene-duplication Problem : A Θ ( n ) Speed-up for the Local Search , 2007 .

[37]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .