iGTP: A software package for large-scale gene tree parsimony analysis

BackgroundThe ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle.ResultsWe introduce iGTP, a platform-independent software program that implements state-of-the-art algorithms that greatly speed up species tree inference under the duplication, duplication-loss, and deep coalescence reconciliation costs. iGTP significantly extends and improves the functionality and performance of existing gene tree parsimony software and offers advanced features such as building effective initial trees using stepwise leaf addition and the ability to have unrooted gene trees in the input. Moreover, iGTP provides a user-friendly graphical interface with integrated tree visualization software to facilitate analysis of the results.ConclusionsiGTP enables, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication, duplication-loss, and deep coalescence reconciliation costs, all from within a convenient graphical user interface.

[1]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[2]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[3]  Ilya B. Muchnik,et al.  A Biologically Consistent Model for Comparing Molecular Phylogenies , 1995, J. Comput. Biol..

[4]  M. Gouy,et al.  A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. , 2002, Genome research.

[5]  R. Page,et al.  How should species phylogenies be inferred from sequence data? , 1999, Systematic biology.

[6]  B. Snel,et al.  Genome trees and the nature of genome evolution. , 2005, Annual review of microbiology.

[7]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[8]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[9]  Nicolas Salamin,et al.  Building supertrees: an empirical assessment using the grass family (Poaceae). , 2002, Systematic biology.

[10]  Paola Bonizzoni,et al.  Reconciling a gene tree to a species tree under the duplication cost model , 2005, Theor. Comput. Sci..

[11]  Martin Vingron,et al.  On the Equivalence of Two Tree Mapping Measures , 1998, Discret. Appl. Math..

[12]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[13]  David Fernández-Baca,et al.  An ILP solution for the gene duplication problem , 2011, BMC Bioinformatics.

[14]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[15]  Michael J. Sanderson,et al.  Paloverde: an OpenGL 3D phylogeny browser , 2006, Bioinform..

[16]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[17]  Oliver Eulenstein,et al.  DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony , 2008, Bioinform..

[18]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[19]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[20]  W. P. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.01 (Build j28) , 2007 .

[21]  J. G. Burleigh,et al.  Heuristics for the Gene-duplication Problem : A Θ ( n ) Speed-up for the Local Search , 2007 .

[22]  Oliver Eulenstein,et al.  Maximum likelihood models and algorithms for gene tree evolution with duplications and losses , 2011, BMC Bioinformatics.

[23]  Jerzy Tiuryn,et al.  DLS-trees: A model of evolutionary scenarios , 2006, Theor. Comput. Sci..

[24]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[25]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[26]  P. Schmidt,et al.  Erratum: Adaptive Evolution of Metabolic Pathways in Drosophila (Molecular Biology and Evolution (2007) 24, (1347-1354)) , 2007 .

[27]  Luay Nakhleh,et al.  Efficient inference of bacterial strain trees from genome-scale multilocus data , 2008, ISMB.

[28]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[29]  Nadia El-Mabrouk,et al.  Gene Family Evolution by Duplication, Speciation and Loss , 2022 .

[30]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[31]  BMC Bioinformatics , 2005 .

[32]  J. McInerney,et al.  The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. , 2005, Molecular biology and evolution.

[33]  Roderic D. M. Page,et al.  Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications , 2001, Pacific Symposium on Biocomputing.

[34]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[35]  Nadia El-Mabrouk,et al.  New Perspectives on Gene Family Evolution: Losses in Reconciliation and a Link with Supertrees , 2009, RECOMB.

[36]  Oliver Eulenstein,et al.  Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models , 2010, BMC Bioinformatics.

[37]  Cedric Chauve,et al.  Branch-and-bound approach for parsimonious inference of a species tree from a set of gene family trees. , 2011, Advances in experimental medicine and biology.

[38]  A. Knight,et al.  Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins. , 1997, Molecular phylogenetics and evolution.

[39]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[40]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[41]  M. Sanderson,et al.  Inferring angiosperm phylogeny from EST data with widespread gene duplication , 2007, BMC Evolutionary Biology.

[42]  Ron Shamir,et al.  A Note on the Fixed Parameter Tractability of the Gene-Duplication Problem , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Oliver Eulenstein,et al.  Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. , 2011, Systematic biology.

[44]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[45]  R. Page Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. , 2000, Molecular phylogenetics and evolution.

[46]  Roderic D. M. Page,et al.  Tangled Tales from Multiple Markers , 2004 .

[47]  Oliver Eulenstein,et al.  Heuristics for the Gene-Duplication Problem: A Theta ( n ) Speed-Up for the Local Search , 2007, RECOMB.

[48]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.