NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees

Large scale gene duplication is a major force driving the evolution of genetic functional innovation. Whole genome duplications are widely believed to have played an important role in the evolution of the maize, yeast, and vertebrate genomes. The use of evolutionary trees to analyze the history of gene duplication and estimate duplication times provides a powerful tool for studying this process. Many studies in the molecular evolution literature have used this approach on small data sets, using analyses performed by hand. The rapid growth of genetic sequence data will soon allow similar studies on a genomic scale, but such studies will be limited unless the analysis can be automated. Even existing data sets admit alternative hypotheses that would be too tedious to consider without automation. In this paper, we describe a program called NOTUNG that facilitates large scale analysis, using both rooted and unrooted trees. When tested on trees analyzed in the literature, NOTUNG consistently yielded results that agree with the assessments in the original publications. Thus, NOTUNG provides a basic building block for inferring duplication dates from gene trees automatically and can also be used as an exploratory analysis tool for evaluating alternative hypotheses.

[1]  D. Birnbaum,et al.  Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. , 1998, Molecular biology and evolution.

[2]  K. H. Wolfe,et al.  Molecular evidence for an ancient duplication of the entire yeast genome , 1997, Nature.

[3]  A. Hughes,et al.  Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. , 1998, Molecular biology and evolution.

[4]  Jim Hu,et al.  A dictionary of genetics , 2000, In Vitro Cellular & Developmental Biology - Animal.

[5]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[6]  Jörg-Rüdiger Sack,et al.  Selected papers presented at the international workshop on Algorithms and data structure , 1998 .

[7]  T Gojobori,et al.  Evolutionary significance of intra-genome duplications on human chromosomes. , 1997, Gene.

[8]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[9]  D Sankoff,et al.  Gene and genome duplication. , 2001, Current opinion in genetics & development.

[10]  Austin L. Hughes,et al.  Phylogenies of Developmentally Important Proteins Do Not Support the Hypothesis of Two Rounds of Genome Duplication Early in Vertebrate History , 1999, Journal of Molecular Evolution.

[11]  L. Silver,et al.  Newly identified paralogous groups on mouse chromosomes 5 and 11 reveal the age of a T-box cluster duplication. , 1997, Genomics.

[12]  Andrew P. Martin Increasing Genomic Complexity by Gene Duplication and the Origin of Vertebrates , 1999, The American Naturalist.

[13]  K. H. Wolfe,et al.  Eukaryote genome duplication - where's the evidence? , 1998, Current opinion in genetics & development.

[14]  Laurent Duret,et al.  Phylogenetic position of the order Lagomorpha (rabbits, hares and allies) , 1996, Nature.

[15]  Michael A. Charleston,et al.  Reconciled trees and incongruent gene and species trees , 1996, Mathematical Hierarchies and Biology.

[16]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[17]  Dannie Durand,et al.  Notung: dating gene duplications using gene family trees , 2000, RECOMB '00.

[18]  Martin Vingron,et al.  Duplication-Based Measures of Difference Between Gene and Species Trees , 1998, J. Comput. Biol..

[19]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[20]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[21]  M. Kasahara,et al.  New insights into the genomic organization and origin of the major histocompatibility complex: role of chromosomal (genome) duplication in the emergence of the adaptive immune system. , 2004, Hereditas.

[22]  Ulrike Stege,et al.  Gene Trees and Species Trees: The Gene-Duplication Problem in Fixed-Parameter Tractable , 1999, WADS.

[23]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[24]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[25]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[26]  James R. Cole,et al.  A new version of the RDP (Ribosomal Database Project) , 1999, Nucleic Acids Res..

[27]  M. Stanhope,et al.  Molecular Phylogenetics and Evolution , 2002 .

[28]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[29]  Martin Vingron,et al.  Comparison of annotating duplication, tree mapping, and copying as methods to compare gene trees with species trees , 1996, Mathematical Hierarchies and Biology.

[30]  Gene trees and species trees the gene duplication problem is fixed-parameter , .

[31]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[32]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[33]  Bin Ma,et al.  On reconstructing species trees from gene trees in term of duplications and losses , 1998, RECOMB '98.

[34]  F. Ayala Molecular systematics , 2004, Journal of Molecular Evolution.

[35]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..