Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony

Gene duplication and gene loss as well as other biological events can result in multiple copies of genes in a given species. Because of these gene duplication and loss dynamics, in addition to variation in sequence evolution and other sources of uncertainty, different gene trees ultimately present different evolutionary histories. All of this together results in gene trees that give different topologies from each other, making consensus species trees ambiguous in places. Other sources of data to generate species trees are also unable to provide completely resolved binary species trees. However, in addition to gene duplication events, speciation events have provided some underlying phylogenetic signal, enabling development of algorithms to characterize these processes. Therefore, a soft parsimony algorithm has been developed that enables the mapping of gene trees onto species trees and modification of uncertain or weakly supported branches based on minimizing the number of gene duplication and loss events implied by the tree. The algorithm also allows for rooting of unrooted trees and for removal of in-paralogues (lineage-specific duplicates and redundant sequences masquerading as such). The algorithm has also been made available for download as a software package, Softparsmap.

[1]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[2]  D. Liberles,et al.  A simple covarion‐based approach to analyse nucleotide substitution rates , 2002 .

[3]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[4]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Matthew J. Betts,et al.  The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics , 2004, Nucleic Acids Res..

[7]  Martin Vingron,et al.  Duplication-Based Measures of Difference Between Gene and Species Trees , 1998, J. Comput. Biol..

[8]  Sean R. Eddy,et al.  ATV: display and manipulation of annotated phylogenetic , 2001, Bioinform..

[9]  Sean R. Eddy,et al.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs , 2002, BMC Bioinformatics.

[10]  Michael T. Hallett,et al.  Simultaneous identification of duplications and lateral transfers , 2004, RECOMB.

[11]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[12]  Christopher J. Lee,et al.  Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems , 2004, Bioinform..

[13]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.

[14]  D. Liberles,et al.  Subfunctionalization of duplicated genes as a transition state to neofunctionalization , 2005, BMC Evolutionary Biology.

[15]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[16]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[17]  R. Page Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. , 2000, Molecular phylogenetics and evolution.

[18]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[19]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[20]  D. Haussler,et al.  Reconstructing large regions of an ancestral mammalian genome in silico. , 2004, Genome research.

[21]  R. Page,et al.  Genetree: A Tool for Exploring Gene Family Evolution , 2000 .

[22]  M. Pilar Francino,et al.  An adaptive radiation model for the origin of new gene functions , 2005 .

[23]  N. Galtier,et al.  Maximum-likelihood phylogenetic analysis under a covarion-like model. , 2001, Molecular biology and evolution.

[24]  Roderic D. M. Page,et al.  Going nuclear: gene family evolution and vertebrate phylogeny reconciled , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[25]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[26]  Jodie J. Yin,et al.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes , 2004, Genome Biology.

[27]  W. Maddison RECONSTRUCTING CHARACTER EVOLUTION ON POLYTOMOUS CLADOGRAMS , 1989, Cladistics : the international journal of the Willi Hennig Society.

[28]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..

[29]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[30]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[31]  A. Force,et al.  The probability of preservation of a newly arisen gene duplicate. , 2001, Genetics.

[32]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[33]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[34]  David A Liberles,et al.  The Adaptive Evolution Database (TAED) , 2001, Genome Biology.