Computational approaches to species phylogeny inference and gene tree reconciliation.

An intricate relation exists between gene trees and species phylogenies, due to evolutionary processes that act on the genes within and across the branches of the species phylogeny. From an analytical perspective, gene trees serve as character states for inferring accurate species phylogenies, and species phylogenies serve as a backdrop against which gene trees are contrasted for elucidating evolutionary processes and parameters. In a 1997 paper, Maddison discussed this relation, reviewed the signatures left by three major evolutionary processes on the gene trees, and surveyed parsimony and likelihood criteria for utilizing these signatures to elucidate computationally this relation. Here, I review progress that has been made in developing computational methods for analyses under these two criteria, and survey remaining challenges.

[1]  Yufeng Wu,et al.  An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees , 2013, RECOMB.

[2]  Johan A. Grahnen,et al.  Toward a General Model for the Evolutionary Dynamics of Gene Duplicates , 2011, Genome biology and evolution.

[3]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[4]  Leo van Iersel,et al.  Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters , 2009, Bioinform..

[5]  James O. McInerney,et al.  Evolutionary analyses of non-genealogical bonds produced by introgressive descent , 2012, Proceedings of the National Academy of Sciences.

[6]  L. Nakhleh,et al.  ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES , 2010 .

[7]  Anders Eriksson,et al.  Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins , 2012, Proceedings of the National Academy of Sciences.

[8]  Daniel H. Huson,et al.  Phylogenetic Networks: Introduction to phylogenetic networks , 2010 .

[9]  J. Lake,et al.  Horizontal gene transfer accelerates genome innovation and evolution. , 2003, Molecular biology and evolution.

[10]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[11]  Manolis Kellis,et al.  Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny , 2011, Molecular biology and evolution.

[12]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[13]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[14]  Luay Nakhleh,et al.  Confounding Factors in HGT Detection: Statistical Error, Coalescent Effects, and Multiple Solutions , 2007, J. Comput. Biol..

[15]  D. Morrison,et al.  Networks in phylogenetic analysis: new tools for population biology. , 2005, International journal for parasitology.

[16]  Yufeng Wu,et al.  Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees , 2010, Bioinform..

[17]  N. Moran,et al.  Evolutionary Origins of Genomic Repertoires in Bacteria , 2005, PLoS biology.

[18]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[19]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[20]  Bengt Oxelman,et al.  Statistical inference of allopolyploid species networks in the presence of incomplete lineage sorting. , 2012, Systematic biology.

[21]  Artem Cherkasov,et al.  Towards Improved Assessment of Functional Similarity in Large-Scale Screens: A Study on Indel Length , 2010, J. Comput. Biol..

[22]  A. Hobolth,et al.  Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model , 2006, PLoS genetics.

[23]  M. Slatkin,et al.  The Concordance of Gene Trees and Species Trees at Two Linked Loci , 2006, Genetics.

[24]  Yufeng Wu,et al.  COALESCENT‐BASED SPECIES TREE INFERENCE FROM GENE TREE TOPOLOGIES UNDER INCOMPLETE LINEAGE SORTING BY MAXIMUM LIKELIHOOD , 2012, Evolution; international journal of organic evolution.

[25]  Tandy J. Warnow,et al.  Fast and accurate methods for phylogenomic analyses , 2011, BMC Bioinformatics.

[26]  Manolis Kellis,et al.  Unified modeling of gene duplication, loss, and coalescence using a locus tree. , 2012, Genome research.

[27]  Loren H Rieseberg,et al.  Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus). , 2012, Molecular phylogenetics and evolution.

[28]  J. Stiller Experimental design and statistical rigor in phylogenomics of horizontal and endosymbiotic gene transfer , 2011, BMC Evolutionary Biology.

[29]  Guohua Jin,et al.  Bootstrap-based Support of HGT Inferred by Maximum Parsimony , 2010, BMC Evolutionary Biology.

[30]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[31]  Oliver Eulenstein,et al.  Reconciling Phylogenetic Trees , 2011 .

[32]  Tandy J. Warnow,et al.  Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles , 2011, J. Comput. Biol..

[33]  Luay Nakhleh,et al.  SPR-based Tree Reconciliation: Non-binary Trees and Multiple Solutions , 2008, APBC.

[34]  D. Posada,et al.  Characterization of Reticulate Networks Based on the Coalescent with Recombination , 2008, Molecular biology and evolution.

[35]  Philipp W. Messer,et al.  Genome Patterns of Selection and Introgression of Haplotypes in Natural Populations of the House Mouse (Mus musculus) , 2012, PLoS genetics.

[36]  Patricia A. McLenachan,et al.  A Statistical Approach for Distinguishing Hybridization and Incomplete Lineage Sorting , 2009, The American Naturalist.

[37]  T. Tuller,et al.  Inferring phylogenetic networks by the maximum parsimony criterion: a case study. , 2006, Molecular biology and evolution.

[38]  Sagi Snir,et al.  Maximum likelihood of phylogenetic networks , 2006, Bioinform..

[39]  Luay Nakhleh,et al.  Parsimonious inference of hybridization in the presence of incomplete lineage sorting. , 2013, Systematic biology.

[40]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[41]  C. Ané,et al.  Comparing two Bayesian methods for gene tree/species tree reconstruction: simulations with incomplete lineage sorting and horizontal gene transfer. , 2011, Systematic biology.

[42]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[43]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[44]  Bengt Oxelman,et al.  Inferring Species Networks from Gene Trees in High-Polyploid North American and Hawaiian Violets (Viola, Violaceae) , 2011, Systematic biology.

[45]  Dannie Durand,et al.  A hybrid micro-macroevolutionary approach to gene tree reconstruction. , 2006 .

[46]  W. Maddison Gene Trees in Species Trees , 1997 .

[47]  Adam Siepel,et al.  Phylogenomics of primates and their ancestral populations. , 2009, Genome research.

[48]  U. Gophna,et al.  The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer. , 2011, Molecular biology and evolution.

[49]  Simon H. Martin,et al.  Butterfly genome reveals promiscuous exchange of mimicry adaptations among species , 2012, Nature.

[50]  Tandy J. Warnow,et al.  Inferring Optimal Species Trees Under Gene Duplication and Loss , 2013, Pacific Symposium on Biocomputing.

[51]  A. Hobolth,et al.  Ancestral Population Genomics: The Coalescent Hidden Markov Model Approach , 2009, Genetics.

[52]  Jonathan F. Wendel,et al.  Phylogenetic Incongruence: Window into Genome History and Molecular Evolution , 1998 .

[53]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[54]  Mark A Ragan,et al.  Untangling hybrid phylogenetic signals: horizontal gene transfer and artifacts of phylogenetic reconstruction. , 2009, Methods in molecular biology.

[55]  Louxin Zhang,et al.  From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[56]  J. Mallet Hybrid speciation , 2007, Nature.

[57]  L. Kubatko Identifying hybridization events in the presence of coalescence via model selection. , 2009, Systematic biology.

[58]  Luay Nakhleh,et al.  Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer , 2008, RECOMB-CG.

[59]  Olga K. Kamneva,et al.  Analysis of Genome Content Evolution in PVC Bacterial Super-Phylum: Assessment of Candidate Genes Associated with Cellular Organization and Lifestyle , 2012, Genome biology and evolution.

[60]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[61]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[62]  L. Boto Horizontal gene transfer in evolution: facts and challenges , 2010, Proceedings of the Royal Society B: Biological Sciences.

[63]  Vincent Moulton,et al.  Using supernetworks to distinguish hybridization from lineage-sorting , 2008, BMC Evolutionary Biology.

[64]  Brian C. Thomas,et al.  Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. , 2006, Genome research.

[65]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[66]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[67]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[68]  L. Nakhleh Evolutionary Phylogenetic Networks: Models and Issues , 2010 .

[69]  Dannie Durand,et al.  Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees , 2012, Bioinform..

[70]  Ge Xia,et al.  Seeing the trees and their branches in the network is hard , 2007, Theor. Comput. Sci..

[71]  Elchanan Mossel,et al.  Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[72]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[73]  D. Liberles,et al.  Evolution after gene duplication , 2010 .

[74]  Luay Nakhleh,et al.  MURPAR: A Fast Heuristic for Inferring Parsimonious Phylogenetic Networks from Multiple Gene Trees , 2012, ISBRA.

[75]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[76]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[77]  Noah A. Rosenberg,et al.  Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences , 2011, J. Comput. Biol..

[78]  David A. Liberles,et al.  The power-law distribution of gene family size is driven by the pseudogenisation rate's heterogeneity between gene families. , 2008, Gene.

[79]  Loren H Rieseberg,et al.  A genomic view of introgression and hybrid speciation. , 2007, Current opinion in genetics & development.

[80]  F. Kondrashov,et al.  The evolution of gene duplications: classifying and distinguishing between models , 2010, Nature Reviews Genetics.

[81]  Daniel H. Huson,et al.  Summarizing Multiple Gene Trees Using Cluster Networks , 2008, WABI.

[82]  Chung-I Wu,et al.  Inferences of species phylogeny in relation to segregation of ancient polymorphisms. , 1991, Genetics.

[83]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[84]  Luay Nakhleh,et al.  Inference of parsimonious species phylogenies from multi-locus data , 2010 .

[85]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[86]  Steven Kelk,et al.  Phylogenetic Networks: Concepts, Algorithms and Applications , 2012 .

[87]  Tandy J. Warnow,et al.  Algorithms for MDC-Based Multi-locus Phylogeny Inference , 2011, RECOMB.

[88]  Matthew J. Betts,et al.  Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony , 2006, Journal of Molecular Evolution.

[89]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[90]  Joel Sjöstrand,et al.  DLRS: gene tree evolution in light of a species tree , 2012, Bioinform..

[91]  Qixin He,et al.  Full modeling versus summarizing gene-tree uncertainty: method choice and species-tree accuracy. , 2012, Molecular phylogenetics and evolution.

[92]  Oliver Eulenstein,et al.  Maximum likelihood models and algorithms for gene tree evolution with duplications and losses , 2011, BMC Bioinformatics.

[93]  Laura Kubatko,et al.  Estimating species trees : practical and theoretical aspects , 2010 .

[94]  Hayley C. Lanier,et al.  Is recombination a problem for species-tree analyses? , 2012, Systematic biology.

[95]  Simone Linz,et al.  On the complexity of computing the temporal hybridization number for two phylogenies , 2013, Discret. Appl. Math..

[96]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[97]  Laura Salter Kubatko,et al.  Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. , 2009, Theoretical population biology.

[98]  Manolis Kellis,et al.  Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss , 2012, Bioinform..

[99]  Qixin He,et al.  Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. , 2010, Systematic biology.

[100]  Steven Kelk,et al.  Networks: expanding evolutionary thinking. , 2013, Trends in genetics : TIG.