A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction

Gene family evolution is determined by microevolutionary processes (e.g., point mutations) and macroevolutionary processes (e.g., gene duplication and loss), yet macroevolutionary considerations are rarely incorporated into gene phylogeny reconstruction methods. We present a dynamic program to find the most parsimonious gene family tree with respect to a macroevolutionary optimization criterion, the weighted sum of the number of gene duplications and losses. The existence of a polynomial delay algorithm for duplication/loss phylogeny reconstruction stands in contrast to most formulations of phylogeny reconstruction, which are NP-complete. We next extend this result to obtain a two-phase method for gene tree reconstruction that takes both micro- and macroevolution into account. In the first phase, a gene tree is constructed from sequence data, using any of the previously known algorithms for gene phylogeny construction. In the second phase, the tree is refined by rearranging regions of the tree that do not have strong support in the sequence data to minimize the duplication/lost cost. Components of the tree with strong support are left intact. This hybrid approach incorporates both micro- and macroevolutionary considerations, yet its computational requirements are modest in practice because the two-phase approach constrains the search space. Our hybrid algorithm can also be used to resolve nonbinary nodes in a multifurcating gene tree. We have implemented these algorithms in a software tool, NOTUNG 2.0, that can be used as a unified framework for gene tree reconstruction or as an exploratory analysis tool that can be applied post hoc to any rooted tree with bootstrap values. The NOTUNG 2.0 graphical user interface can be used to visualize alternate duplication/loss histories, root trees according to duplication and loss parsimony, manipulate and annotate gene trees, and estimate gene duplication times. It also offers a command line option that enables high-throughput analysis of a large number of trees.

[1]  R. Page,et al.  Trees within trees: phylogeny and historical associations. , 1998, Trends in ecology & evolution.

[2]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[3]  A. Hughes,et al.  Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. , 1998, Molecular biology and evolution.

[4]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[5]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[6]  Martin Vingron,et al.  Duplication-Based Measures of Difference Between Gene and Species Trees , 1998, J. Comput. Biol..

[7]  Pawel Górecki,et al.  Reconciliation problems for duplication, loss and horizontal gene transfer , 2004, RECOMB.

[8]  Alan Filipski,et al.  Neighbor-Joining Method , 2004 .

[9]  Sean R. Eddy,et al.  ATV: display and manipulation of annotated phylogenetic , 2001, Bioinform..

[10]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[11]  David S. Johnson,et al.  The computational complexity of inferring rooted phylogenies by parsimony , 1986 .

[12]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[13]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[14]  Julie Dawn Thompson,et al.  Improved sensitivity of profile searches through the use of sequence weights and gap excision , 1994, Comput. Appl. Biosci..

[15]  Michael T. Hallett,et al.  Simultaneous identification of duplications and lateral transfers , 2004, RECOMB.

[16]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[17]  Michael A. Charleston,et al.  Reconciled trees and incongruent gene and species trees , 1996, Mathematical Hierarchies and Biology.

[18]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[19]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[20]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[21]  K. Nicholas,et al.  GeneDoc: Analysis and visualization of genetic variation , 1997 .

[22]  Guy Perrière,et al.  Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases , 2005, Bioinform..

[23]  Tamir Tuller,et al.  Maximum Likelihood of Evolutionary Trees Is Hard , 2005, RECOMB.

[24]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[25]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[26]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[27]  D. Sheehan,et al.  Structure, function and evolution of glutathione transferases: implications for classification of non-mammalian members of an ancient enzyme superfamily. , 2001, The Biochemical journal.

[28]  Martin Vingron,et al.  Comparison of annotating duplication, tree mapping, and copying as methods to compare gene trees with species trees , 1996, Mathematical Hierarchies and Biology.

[29]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[30]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[31]  L. Silver,et al.  Newly identified paralogous groups on mouse chromosomes 5 and 11 reveal the age of a T-box cluster duplication. , 1997, Genomics.

[32]  Ulrike Stege,et al.  Gene Trees and Species Trees: The Gene-Duplication Problem in Fixed-Parameter Tractable , 1999, WADS.

[33]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[34]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[35]  D. Birnbaum,et al.  Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. , 1998, Molecular biology and evolution.

[36]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[37]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.

[38]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..