Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss

Motivation: Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction. Results: We present two new algorithms for the DTL reconciliation problem that are dramatically faster than existing algorithms, both asymptotically and in practice. We also extend the standard DTL reconciliation model by considering distance-dependent transfer costs, which allow for more accurate reconciliation and give an efficient algorithm for DTL reconciliation under this extended model. We implemented our new algorithms and demonstrated up to 100 000-fold speed-up over existing methods, using both simulated and biological datasets. This dramatic improvement makes it possible to use DTL reconciliation for performing rigorous evolutionary analyses of large gene families and enables its use in advanced reconciliation-based gene and species tree reconstruction methods. Availability: Our programs can be freely downloaded from http://compbio.mit.edu/ranger-dtl/. Contact: mukul@csail.mit.edu; manoli@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Jerzy Tiuryn,et al.  DLS-trees: A model of evolutionary scenarios , 2006, Theor. Comput. Sci..

[2]  Ran Libeskind-Hadas,et al.  On the Computational Complexity of the Reticulate Cophylogeny Reconstruction Problem , 2009, J. Comput. Biol..

[3]  Manolis Kellis,et al.  A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction , 2010, Molecular biology and evolution.

[4]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[5]  V. Makarenkov,et al.  Inferring and validating horizontal gene transfer events using bipartition dissimilarity. , 2010, Systematic biology.

[6]  Michael T. Hallett,et al.  Simultaneous Identification of Duplications and Lateral Gene Transfers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[8]  Jian Ma,et al.  DUPCAR: Reconstructing Contiguous Ancestral Regions with Duplications , 2008, J. Comput. Biol..

[9]  J. Lagergren,et al.  Probabilistic orthology analysis. , 2009, Systematic biology.

[10]  Lawrence A. David,et al.  Rapid evolutionary innovation during an Archaean genetic expansion , 2011, Nature.

[11]  Ali Tofigh,et al.  Using Trees to Capture Reticulate Evolution : Lateral Gene Transfers and Cancer Progression , 2009 .

[12]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[13]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[14]  Oliver Eulenstein,et al.  Heuristics for the Gene-Duplication Problem: A Theta ( n ) Speed-Up for the Local Search , 2007, RECOMB.

[15]  Steven Skiena,et al.  Lowest common ancestors in trees and directed acyclic graphs , 2005, J. Algorithms.

[16]  F. Hoffmann,et al.  Tangled Trees: Phylogeny, Cospeciation, and Coevolution , 2004 .

[17]  Nadia El-Mabrouk,et al.  Gene Family Evolution by Duplication, Speciation and Loss , 2022 .

[18]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[19]  Gorbunov KIu,et al.  Reconstructing genes evolution along a species tree , 2009 .

[20]  Fredrik Ronquist,et al.  RECONSTRUCTING THE HISTORY OF HOST‐PARASITE ASSOCIATIONS USING GENERALISED PARSIMONY , 1995, Cladistics : the international journal of the Willi Hennig Society.

[21]  Daniel Merkle,et al.  A parameter-adaptive dynamic programming approach for inferring cophylogenies , 2010, BMC Bioinformatics.

[22]  Ran Libeskind-Hadas,et al.  The Cophylogeny Reconstruction Problem Is NP-Complete , 2011, J. Comput. Biol..

[23]  J. G. Burleigh,et al.  Heuristics for the Gene-duplication Problem : A Θ ( n ) Speed-up for the Local Search , 2007 .

[24]  Cheryl P. Andam,et al.  Biased gene transfer in microbial evolution , 2011, Nature Reviews Microbiology.

[25]  Jean Vuillemin,et al.  A data structure for manipulating priority queues , 1978, CACM.

[26]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[27]  Bengt Sennblad,et al.  The gene evolution model and computing its associated probabilities , 2009, JACM.

[28]  A BAYESIAN FRAMEWORK FOR THE ANALYSIS OF COSPECIATION , 2000, Evolution; international journal of organic evolution.

[29]  Oliver Eulenstein,et al.  Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. , 2011, Systematic biology.

[30]  Sagi Snir,et al.  Parsimony Score of Phylogenetic Networks: Hardness Results and a Linear-Time Heuristic , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  S. Srinivasa Rao,et al.  Path Minima Queries in Dynamic Weighted Trees , 2011, WADS.

[32]  Vincent Berry,et al.  An Efficient Algorithm for Gene/Species Trees Parsimonious Reconciliation with Losses, Duplications and Transfers , 2010, RECOMB-CG.

[33]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[34]  Fredrik Ronquist,et al.  Parsimony analysis of coevolving species associa-tions , 2002 .

[35]  Daniel Merkle,et al.  Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information , 2005, Theory in Biosciences.

[36]  K. Gorbunov,et al.  [Reconstructing genes evolution along a species tree]. , 2009, Molekuliarnaia biologiia.

[37]  Martin Vingron,et al.  On the Equivalence of Two Tree Mapping Measures , 1998, Discret. Appl. Math..

[38]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[39]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[40]  M. Charleston,et al.  Jungles: a new solution to the host/parasite phylogeny reconciliation problem. , 1998, Mathematical biosciences.

[41]  Robert Fredriksson,et al.  SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees , 2010, BMC Evolutionary Biology.

[42]  Frank Rutschmann,et al.  Molecular dating of phylogenetic trees : A brief review of current methods that estimate divergence times , 2022 .

[43]  Ran Libeskind-Hadas,et al.  Jane: a new tool for the cophylogeny reconstruction problem , 2010, Algorithms for Molecular Biology.

[44]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[45]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[46]  Anushya Muruganujan,et al.  PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium , 2009, Nucleic Acids Res..

[47]  Roderic D. M. Page,et al.  Tangled trees : phylogeny, cospeciation, and coevolution , 2003 .

[48]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[49]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[50]  Berend Snel,et al.  Keeping Afloat: A Strategy for Small Island Nations , 2005, BMC Bioinformatics.

[51]  V. A. Lyubetsky,et al.  Reconstructing the evolution of genes along the species tree , 2009, Molecular Biology.

[52]  István Miklós,et al.  A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer , 2005, RECOMB.

[53]  Paola Bonizzoni,et al.  Reconciling a gene tree to a species tree under the duplication cost model , 2005, Theor. Comput. Sci..

[54]  Erik L. L. Sonnhammer,et al.  Automated ortholog inference from phylogenetic trees and calculation of orthology reliability , 2002, Bioinform..

[55]  Michael A. Charleston,et al.  Traversing the tangle: Algorithms and applications for cophylogenetic studies , 2006, J. Biomed. Informatics.