Reconciliation with non-binary species trees.

Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|V(G) | x (k(S) + h(S))) time, where |V(G)| is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.

[1]  Sydney Anne Cameron,et al.  Molecular Evolution: A Phylogenetic Approach.—Roderic D. M. Page and Edward C. Holmes. , 2002 .

[2]  G. Hoelzer,et al.  mtDNA diversity in rhesus monkeys reveals overestimates of divergence time and paraphyly with neighboring species. , 1993, Molecular biology and evolution.

[3]  T. Moum,et al.  POLYTOMIES AND THE POWER OF PHYLOGENETIC INFERENCE , 1999, Evolution; international journal of organic evolution.

[4]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[5]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[6]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[7]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..

[8]  Fredrik Ronquist,et al.  Parsimony analysis of coevolving species associa-tions , 2002 .

[9]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[10]  Matthew J. Betts,et al.  The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics , 2004, Nucleic Acids Res..

[11]  Jeffery P. Demuth,et al.  The Evolution of Mammalian Gene Families , 2006, PloS one.

[12]  M. Nei,et al.  Gene genealogy and variance of interpopulational nucleotide differences. , 1985, Genetics.

[13]  A. Hughes,et al.  Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. , 1998, Molecular biology and evolution.

[14]  Mira V. Han,et al.  Gene Family Evolution across 12 Drosophila Genomes , 2007, PLoS genetics.

[15]  Oliver Eulenstein,et al.  Reconciling Gene Trees with Apparent Polytomies , 2006, COCOON.

[16]  M. Goodman,et al.  An orphaned mammalian beta-globin gene of ancient evolutionary origin. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  L. Silver,et al.  Newly identified paralogous groups on mouse chromosomes 5 and 11 reveal the age of a T-box cluster duplication. , 1997, Genomics.

[18]  Ulrike Stege,et al.  Gene Trees and Species Trees: The Gene-Duplication Problem in Fixed-Parameter Tractable , 1999, WADS.

[19]  D. Birnbaum,et al.  Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. , 1998, Molecular biology and evolution.

[20]  M. Gouy,et al.  HOBACGEN: database system for comparative genomics in bacteria. , 2000, Genome research.

[21]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[22]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[23]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.

[24]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[25]  Guy Perrière,et al.  Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases , 2005, Bioinform..

[26]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[27]  Nello Cristianini,et al.  CAFE: a computational tool for the study of gene family evolution , 2006, Bioinform..

[28]  Klaas Vandepoele,et al.  Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  M. Ruggero,et al.  Similarity of Traveling-Wave Delays in the Hearing Organs of Humans and Other Tetrapods , 2007, Journal for the Association for Research in Otolaryngology.

[30]  Kevin de Queiroz,et al.  Phylogenetic Relationships and Tempo of Early Diversification in Anolis Lizards , 1999 .

[31]  Sean R. Eddy,et al.  ATV: display and manipulation of annotated phylogenetic , 2001, Bioinform..

[32]  Karsten Hokamp,et al.  Extensive genomic duplication during early chordate evolution , 2002, Nature Genetics.

[33]  W. Maddison Gene Trees in Species Trees , 1997 .

[34]  Bengt Sennblad,et al.  primetv: a viewer for reconciled trees , 2006, BMC Bioinformatics.

[35]  Terence P Speed,et al.  The serine repeat antigen (SERA) gene family phylogeny in Plasmodium: the impact of GC content and reconciliation of gene and species trees. , 2004, Molecular biology and evolution.

[36]  W. Maddison RECONSTRUCTING CHARACTER EVOLUTION ON POLYTOMOUS CLADOGRAMS , 1989, Cladistics : the international journal of the Willi Hennig Society.

[37]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[38]  Masatoshi Nei,et al.  Evolutionary change of the numbers of homeobox genes in bilateral animals. , 2005, Molecular biology and evolution.

[39]  Steven Maere,et al.  The gain and loss of genes during 600 million years of vertebrate evolution , 2006, Genome Biology.

[40]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[41]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[42]  R. Page,et al.  How should species phylogenies be inferred from sequence data? , 1999, Systematic biology.

[43]  D. Searls Pharmacophylogenomics: genes, evolution and drug targets , 2003, Nature Reviews Drug Discovery.

[44]  Michael Sorenson,et al.  Is homoplasy or lineage sorting the source of incongruent mtdna and nuclear gene trees in the stiff-tailed ducks (Nomonyx-Oxyura)? , 2005, Systematic biology.

[45]  Erik Verheyen,et al.  Phylogeny of the Lake Tanganyika cichlid species flock and its relationship to the Central and East African haplochromine cichlid fish faunas. , 2002, Systematic biology.

[46]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[47]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[48]  Jingchu Luo,et al.  Duplication and DNA segmental loss in the rice genome: implications for diploidization. , 2005, The New phytologist.

[49]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[50]  Jonathan A. Eisen,et al.  The age of the Arabidopsis thaliana genome duplication , 2003, Plant Molecular Biology.

[51]  Xun Gu,et al.  Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution , 2002, Nature Genetics.

[52]  Matthew J. Betts,et al.  Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony , 2006, Journal of Molecular Evolution.

[53]  Martin Vingron,et al.  Duplication-Based Measures of Difference Between Gene and Species Trees , 1998, J. Comput. Biol..

[54]  M. Goodman,et al.  An orphaned mammalian β-globin gene of ancient evolutionary origin , 2001 .

[55]  G. Hoelzer,et al.  Patterns of speciation and limits to phylogenetic resolution. , 1994, Trends in ecology & evolution.

[56]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[57]  Dannie Durand,et al.  A hybrid micro-macroevolutionary approach to gene tree reconstruction. , 2006 .

[58]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[59]  S. Poe,et al.  BIRDS IN A BUSH: FIVE GENES INDICATE EXPLOSIVE EVOLUTION OF AVIAN ORDERS , 2004, Evolution; international journal of organic evolution.

[60]  A. Paterson,et al.  Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Klaas Vandepoele,et al.  Evidence That Rice and Other Cereals Are Ancient Aneuploids Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.014019. , 2003, The Plant Cell Online.

[62]  Michel C. Milinkovitch,et al.  A Phylogenetic Approach to the Problem of Differential Lineage Sorting , 1997 .

[63]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[64]  R. Hudson Gene genealogies and the coalescent process. , 1990 .