Introduction to Computational Phylogenetics

This manuscript is a draft, and should not be distributed. Some of the material in this text appeared verbatim in unpublished notes for the course " Computational methods in linguistic reconstruction " taught for the LSA Institute in 2009 at the

[1]  Satish Rao,et al.  Quartets MaxCut: A Divide and Conquer Quartets Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[3]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[4]  Tao Jiang,et al.  A more efficient approximation scheme for tree alignment , 1997, RECOMB '97.

[5]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[6]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[7]  J. Felsenstein An alternating least squares approach to inferring phylogenies from pairwise distances. , 1997, Systematic biology.

[8]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[9]  Tandy J. Warnow,et al.  Absolute convergence: true trees from short sequences , 2001, SODA '01.

[10]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[11]  Tandy Warnow,et al.  BBCA: Improving the scalability of *BEAST using random binning , 2014, BMC Genomics.

[12]  Lusheng Wang,et al.  Improved Approximation Algorithms for Tree Alignment , 1996, J. Algorithms.

[13]  Sébastien Roch,et al.  Sequence Length Requirement of Distance-Based Phylogeny Reconstruction: Breaking the Polynomial Barrier , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Tandy Warnow,et al.  On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. , 2015, Systematic biology.

[15]  David Fernández-Baca,et al.  Improved Heuristics for Minimum-Flip Supertree Construction , 2006, Evolutionary bioinformatics online.

[16]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[17]  Roderic D. M. Page,et al.  Modified Mincut Supertrees , 2002, WABI.

[18]  Mike A. Steel,et al.  Constructing Optimal Trees from Quartets , 2001, J. Algorithms.

[19]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[20]  F. Ayala Molecular systematics , 2004, Journal of Molecular Evolution.

[21]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[22]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[23]  Charles Semple,et al.  A supertree method for rooted trees , 2000, Discret. Appl. Math..

[24]  P. Erdös,et al.  Local Quartet Splits of a Binary Tree Infer All Quartet Splits Via One Dyadic Inference Rule , 1996, Comput. Artif. Intell..

[25]  Vincent Berry,et al.  Faster reliable phylogenetic analysis , 1999, RECOMB.

[26]  Tandy Warnow,et al.  Disk covering methods improve phylogenomic analyses , 2014, BMC Genomics.

[27]  P. Waddell,et al.  Rapid Evaluation of Least-Squares and Minimum-Evolution Criteria on Phylogenetic Trees , 1998 .

[28]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[29]  J. Degnan Anomalous unrooted gene trees. , 2013, Systematic biology.

[30]  Tandy J. Warnow,et al.  Approximating the Complement of the Maximum Compatible Subset of Leaves of k Trees , 2002, APPROX.

[31]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[32]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[33]  Otto Optiz,et al.  Conceptual and Numerical Analysis of Data , 1989 .

[34]  J. Huelsenbeck,et al.  Hobgoblin of phylogenetics? , 1994, Nature.

[35]  Tandy Warnow,et al.  SuperFine: fast and accurate supertree estimation. , 2012, Systematic biology.

[36]  Md. Shamsuzzoha Bayzid,et al.  Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses , 2014, PloS one.

[37]  Mike Steel Consistency of Bayesian inference of resolved phylogenetic trees. , 2013, Journal of theoretical biology.

[38]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[39]  Sébastien Roch,et al.  A short proof that phylogenetic tree reconstruction by maximum likelihood is hard , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  David Fernández-Baca,et al.  Flipping: A supertree construction method , 2001, Bioconsensus.

[41]  Tao Jiang,et al.  Approximation algorithms for tree alignment with a given phylogeny , 1996, Algorithmica.

[42]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[43]  David Bryant,et al.  Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. , 2009, Molecular phylogenetics and evolution.

[44]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[45]  Mikkel Thorup,et al.  Fast comparison of evolutionary trees , 1994, SODA '94.

[46]  M. Ragan,et al.  Next-generation phylogenomics , 2013, Biology Direct.

[47]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[48]  David Fernández-Baca,et al.  Fast Local Search for Unrooted Robinson-Foulds Supertrees , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  S. Roch Toward Extracting All Phylogenetic Information from Matrices of Evolutionary Distances , 2010, Science.

[50]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[51]  O. Gascuel On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. , 2000, Molecular biology and evolution.

[52]  W. Vach Least squares approximation of addititve trees , 1989 .

[53]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[54]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[55]  Sagi Snir,et al.  Weighted quartets phylogenetics. , 2015, Systematic biology.

[56]  Tao Jiang,et al.  Recovering branches on the tree of life: an approximation algorithm , 1999, SODA '99.

[57]  Satish Rao,et al.  Using Max Cut to Enhance Rooted Trees Consistency , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[58]  O. Gascuel,et al.  Quartet-based phylogenetic inference: improvements and limits. , 2001, Molecular biology and evolution.

[59]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[60]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[61]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.

[62]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .

[63]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[64]  T. Tuller,et al.  Inferring phylogenetic networks by the maximum parsimony criterion: a case study. , 2006, Molecular biology and evolution.

[65]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[66]  Serita M. Nelesen,et al.  SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. , 2012, Systematic biology.

[67]  Tandy J. Warnow,et al.  Designing fast converging phylogenetic methods , 2001, ISMB.

[68]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[69]  Olivier Gascuel,et al.  FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program , 2015, Molecular biology and evolution.

[70]  Satish Rao,et al.  Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm , 2008, J. Comput. Biol..

[71]  Ziheng Yang,et al.  The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. , 2010, Molecular biology and evolution.

[72]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[73]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[74]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[75]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[76]  M. Rosenberg,et al.  Multiple sequence alignment accuracy and phylogenetic inference. , 2006, Systematic biology.

[77]  K. Kidd,et al.  Phylogenetic analysis: concepts and methods. , 1971, American journal of human genetics.

[78]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.

[79]  M. Steel,et al.  Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. , 2015, Theoretical population biology.

[80]  Tao Jiang,et al.  Quartet Cleaning: Improved Algorithms and Simulations , 1999, ESA.

[81]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[82]  O. Gascuel,et al.  The Minimum-Evolution Distance-Based Approach to Phylogeny Inference , 2005 .

[83]  S. Jeffery Evolution of Protein Molecules , 1979 .

[84]  David Bryant,et al.  A classification of consensus methods for phylogenetics , 2001, Bioconsensus.

[85]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[86]  O Gascuel,et al.  Strengths and limitations of the minimum evolution principle. , 2001, Systematic biology.

[87]  Tandy J. Warnow,et al.  DACTAL: divide-and-conquer trees (almost) without alignments , 2012, Bioinform..

[88]  Laura Salter Kubatko,et al.  Quartet Inference from SNP Data Under the Coalescent Model , 2014, Bioinform..

[89]  Tandy J. Warnow,et al.  FASTSP: linear time calculation of alignment accuracy , 2011, Bioinform..

[90]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[91]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[92]  A. D. Gordon,et al.  Obtaining common pruned trees , 1985 .

[93]  J. A. Cavender Taxonomy with confidence , 1978 .

[94]  Colin N. Dewey,et al.  BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis , 2010, Bioinform..

[95]  M Steel,et al.  Links between maximum likelihood and maximum parsimony under a simple model of site substitution. , 1997, Bulletin of mathematical biology.

[96]  Tandy J. Warnow,et al.  Tree compatibility and inferring evolutionary history , 1994, SODA '93.

[97]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[98]  Joseph T. Chang,et al.  Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. , 1996, Mathematical biosciences.

[99]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[100]  Elchanan Mossel,et al.  Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[101]  Tao Jiang,et al.  A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application , 2001, SIAM J. Comput..

[102]  Tandy J. Warnow,et al.  MRL and SuperFine+MRL: new supertree methods , 2012, Algorithms for Molecular Biology.

[103]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[104]  Olivier Gascuel,et al.  Inferring evolutionary trees with strong combinatorial evidence , 1997, Theor. Comput. Sci..

[105]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[106]  Rezwana Reaz,et al.  Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach , 2014, PloS one.

[107]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[108]  Amihood Amir,et al.  Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms , 1997, SIAM J. Comput..

[109]  Tandy J. Warnow,et al.  Naive binning improves phylogenomic analyses , 2013, Bioinform..

[110]  Tandy J. Warnow,et al.  An experimental study of Quartets MaxCut and other supertree methods , 2010, Algorithms for Molecular Biology.

[112]  Alfred V. Aho,et al.  Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions , 1981, SIAM J. Comput..

[113]  John Gatesy,et al.  Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. , 2014, Molecular phylogenetics and evolution.

[114]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[115]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[116]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[117]  Constantinos Daskalakis,et al.  Alignment-Free Phylogenetic Reconstruction , 2010, RECOMB.

[118]  Mark A. Ragan,et al.  The MRP Method , 2004 .

[119]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[120]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[121]  M. Ragan,et al.  Inferring phylogenies of evolving sequences without multiple sequence alignment , 2014, Scientific Reports.

[122]  Tandy J. Warnow,et al.  Finding a Maximum Compatible Tree for a Bounded Number of Trees with Bounded Degree Is Solvable in Polynomial Time , 2001, WABI.

[123]  J. Degnan,et al.  Fast and consistent estimation of species trees using supermatrix rooted triples. , 2010, Molecular biology and evolution.