Maximum likelihood models and algorithms for gene tree evolution with duplications and losses

BackgroundThe abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.ResultsWe introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.ConclusionsIn test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.

[1]  Heresa,et al.  Perils of Paralogy : Using HSP 70 Genes for Inferring Organismal Phylogenies , 2002 .

[2]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[3]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[4]  Andrew P. Martin,et al.  Perils of paralogy: using HSP70 genes for inferring organismal phylogenies. , 2002, Systematic biology.

[5]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[6]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.

[7]  Tao Liu,et al.  TreeFam: 2008 Update , 2007, Nucleic Acids Res..

[8]  R. Page Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. , 2000, Molecular phylogenetics and evolution.

[9]  Roderic D. M. Page,et al.  Going nuclear: gene family evolution and vertebrate phylogeny reconciled , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[10]  Jerzy Tiuryn,et al.  DLS-trees: A model of evolutionary scenarios , 2006, Theor. Comput. Sci..

[11]  Cédric Chauve,et al.  An Efficient Method for Exploring the Space of Gene Tree/Species Tree Reconciliations in a Probabilistic Framework , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[13]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[14]  John S. Conery,et al.  The evolutionary demography of duplicate genes , 2004, Journal of Structural and Functional Genomics.

[15]  Jerzy Tiuryn,et al.  URec: a system for unrooted reconciliation , 2007, Bioinform..

[16]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[17]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[18]  A. Knight,et al.  Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. , 1997, Molecular phylogenetics and evolution.

[19]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[20]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22]  Jeroen Raes,et al.  Duplication and divergence: the evolution of new genes and old ideas. , 2004, Annual review of genetics.

[23]  Jerzy Tiuryn,et al.  Inferring phylogeny from whole genomes , 2007, Bioinform..

[24]  R. Page,et al.  Rates and patterns of gene duplication and loss in the human genome , 2005, Proceedings of the Royal Society B: Biological Sciences.

[25]  Jeffery P. Demuth,et al.  The Evolution of Mammalian Gene Families , 2006, PloS one.

[26]  Bengt Sennblad,et al.  The gene evolution model and computing its associated probabilities , 2009, JACM.

[27]  M. Sanderson,et al.  Inferring angiosperm phylogeny from EST data with widespread gene duplication , 2007, BMC Evolutionary Biology.

[28]  John Gatesy,et al.  The vestigial olfactory receptor subgenome of odontocete whales: phylogenetic congruence between gene-tree reconciliation and supermatrix methods. , 2008, Systematic biology.

[29]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[30]  Cédric Chauve,et al.  Space of Gene/Species Trees Reconciliations and Parsimonious Models , 2009, J. Comput. Biol..

[31]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.