Hands-on Introduction to Sequence-Length Requirements in Phylogenetics

In this tutorial, through a series of analytical computations and numerical simulations, we review many known insights into a fundamental question: how much data is needed to reconstruct the Tree of Life? A Jupyter notebook and code for this tutorial are provided in Python.

[1]  Elchanan Mossel,et al.  Evolutionary trees and the Ising model on the Bethe lattice: a proof of Steel’s conjecture , 2005, ArXiv.

[2]  László A. Székely,et al.  Inverting Random Functions II: Explicit Bounds for Discrete Maximum Likelihood Estimation, with Applications , 2002, SIAM J. Discret. Math..

[3]  Elchanan Mossel,et al.  Maximal Accurate Forests from Distance Matrices , 2006, RECOMB.

[4]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[5]  Elchanan Mossel,et al.  On the Inference of Large Phylogenies with Long Branches: How Long Is Too Long? , 2010, Bulletin of mathematical biology.

[6]  Elchanan Mossel,et al.  Phylogenies without Branch Bounds: Contracting the Short, Pruning the Deep , 2011, SIAM J. Discret. Math..

[7]  Tandy J. Warnow,et al.  Absolute convergence: true trees from short sequences , 2001, SODA '01.

[8]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[9]  Allan Sly,et al.  Phase transition in the sample complexity of likelihood-based phylogeny inference , 2015, 1508.01964.

[10]  Constantinos Daskalakis,et al.  Alignment-Free Phylogenetic Reconstruction: Sample Complexity via a Branching Process Analysis , 2011, ArXiv.

[11]  Elchanan Mossel Distorted Metrics on Trees and Phylogenetic Forests , 2007, TCBB.

[12]  Tandy Warnow,et al.  Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation , 2017 .

[13]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[14]  Satish Rao,et al.  Fast Phylogeny Reconstruction Through Learning of Ancestral Sequences , 2008, Algorithmica.

[15]  Elchanan Mossel Phase transitions in phylogeny , 2003, Transactions of the American Mathematical Society.

[16]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[17]  J. A. Cavender Taxonomy with confidence , 1978 .

[18]  Tandy J. Warnow,et al.  The Accuracy of Fast Phylogenetic Methods for Large Datasets , 2001, Pacific Symposium on Biocomputing.

[19]  Robert D. Nowak,et al.  Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Tandy Warnow,et al.  On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. , 2015, Systematic biology.

[21]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[22]  Tandy J. Warnow,et al.  Sequence-Length Requirements for Phylogenetic Methods , 2002, WABI.

[23]  Elchanan Mossel,et al.  On the Impossibility of Reconstructing Ancestral Data and Phylogenies , 2003, J. Comput. Biol..

[24]  Joseph T. Chang,et al.  A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences. , 2006, Mathematical biosciences.

[25]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[26]  S. Roch Toward Extracting All Phylogenetic Information from Matrices of Evolutionary Distances , 2010, Science.

[27]  Tandy J. Warnow,et al.  Toward new software for computational phylogenetics , 2002, Computer.

[28]  Elchanan Mossel,et al.  Distance-based species tree estimation under the coalescent: Information-theoretic trade-off between number of loci and sequence length , 2017 .