Synthesis of phylogeny and taxonomy into a comprehensive tree of life

Significance Scientists have used gene sequences and morphological data to construct tens of thousands of evolutionary trees that describe the evolutionary history of animals, plants, and microbes. This study is the first, to our knowledge, to apply an efficient and automated process for assembling published trees into a complete tree of life. This tree and the underlying data are available to browse and download from the Internet, facilitating subsequent analyses that require evolutionary trees. The tree can be easily updated with newly published data. Our analysis of coverage not only reveals gaps in sampling and naming biodiversity but also further demonstrates that most published phylogenies are not available in digital formats that can be summarized into a tree of life. Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.

[1]  C. Darwin The Origin of Species by Means of Natural Selection, Or, The Preservation of Favoured Races in the Struggle for Life , 2019 .

[2]  A. Gray,et al.  I. THE ORIGIN OF SPECIES BY MEANS OF NATURAL SELECTION , 1963 .

[3]  C. Darwin On the Origin of Species by Means of Natural Selection: Or, The Preservation of Favoured Races in the Struggle for Life , 2019 .

[4]  J. Lake,et al.  Eocytes: a new ribosome structure indicates a kingdom with a close relationship to eukaryotes. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[5]  T. Dowling,et al.  The role of hybridization and introgression in the diversification of animals , 1997 .

[6]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[7]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[8]  M J Sanderson,et al.  Assessment of the accuracy of matrix representation with parsimony analysis supertree construction. , 2001, Systematic biology.

[9]  Charles Semple,et al.  Reconstructing Minimal Rooted Trees , 2003, Discret. Appl. Math..

[10]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[11]  Loren H Rieseberg,et al.  Reconstructing patterns of reticulate evolution in plants. , 2004, American journal of botany.

[12]  Mark Wilkinson,et al.  Measuring support and finding unsupported relationships in supertrees. , 2005, Systematic biology.

[13]  M. Lane The Global Biodiversity Information Facility , 2005 .

[14]  D. Dykhuizen Species Numbers in Bacteria. , 2005, Proceedings. California Academy of Sciences.

[15]  K. Sjölander,et al.  Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA). , 2006, Omics : a journal of integrative biology.

[16]  Kamran Shalchian-Tabrizi,et al.  Phylogenomics Reshuffles the Eukaryotic Supergroups , 2007, PloS one.

[17]  Michael Weiss,et al.  A higher-level phylogenetic classification of the Fungi. , 2007, Mycological research.

[18]  Sylvain Guillemot,et al.  Finding a largest subset of rooted triples identifying a tree is an NP-hard task Research Report LIRMM - RR-07010 , 2007 .

[19]  Michael J Sanderson,et al.  Phylogenetic Signal in the Eukaryotic Tree of Life , 2008, Science.

[20]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[21]  Corinne Da Silva,et al.  Phylogenomics Revives Traditional Views on Deep Animal Relationships , 2009, Current Biology.

[22]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[23]  Arndt von Haeseler,et al.  Accuracy of phylogeny reconstruction methods combining overlapping gene data sets , 2010, Algorithms for Molecular Biology.

[24]  B. Morgenstern,et al.  Improved Phylogenomic Taxon Sampling Noticeably Affects Nonbilaterian Relationships , 2010, Molecular biology and evolution.

[25]  D. Albach,et al.  Towards resolving Lamiales relationships: insights from rapidly evolving chloroplast sequences , 2010, BMC Evolutionary Biology.

[26]  Systema Naturae 250 - The Linnaean Ark , 2010 .

[27]  Mike Steel,et al.  Phylogenomics with incomplete taxon coverage: the limits to inference , 2010, BMC Evolutionary Biology.

[28]  John Edmondson,et al.  Systema Naturae 250: The Linnaean Ark , 2011 .

[29]  R. Henrik Nilsson,et al.  Progress in molecular and morphological taxon discovery in Fungi and options for formal classification of environmental sequences , 2011 .

[30]  C. Mora,et al.  How Many Species Are There on Earth and in the Ocean? , 2011, PLoS biology.

[31]  Daniel J. G. Lahr,et al.  Estimating the timing of early eukaryotic diversification with multigene molecular clocks , 2011, Proceedings of the National Academy of Sciences.

[32]  Arlin Stoltzfus,et al.  Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis , 2012, BMC Research Notes.

[33]  Hilmar Lapp,et al.  NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata , 2012, Systematic biology.

[34]  Laura Wegener Parfrey,et al.  Turning the crown upside down: gene tree parsimony roots the eukaryotic tree of life. , 2012, Systematic biology.

[35]  M. Syvanen,et al.  Evolutionary implications of horizontal gene transfer. , 2012, Annual review of genetics.

[36]  Tomasello,et al.  A congruent phylogenomic signal places eukaryotes within the Archaea , 2012, Proceedings of the Royal Society B: Biological Sciences.

[37]  W. Jetz,et al.  The global diversity of birds in space and time , 2012, Nature.

[38]  K. Eric Wommack,et al.  Groundtruthing Next-Gen Sequencing for Microbial Ecology–Biases and Errors in Community Structure Estimates from PCR Amplicon Pyrosequencing , 2012, PloS one.

[39]  Simon P. Wilson,et al.  Predicting total global species richness using rates of species description and estimates of taxonomic effort. , 2012, Systematic biology.

[40]  B. Lang,et al.  Rooting the eukaryotic tree with mitochondrial and bacterial proteins. , 2012, Molecular biology and evolution.

[41]  D. Bhattacharya,et al.  Algal endosymbionts as vectors of horizontal gene transfer in photosynthetic eukaryotes , 2013, Front. Plant Sci..

[42]  Joseph W. Brown,et al.  Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs , 2013, PLoS Comput. Biol..

[43]  R. Cordaux,et al.  Horizontal Transfer and Evolution of Prokaryote Transposable Elements in Eukaryotes , 2013, Genome biology and evolution.

[44]  Charles Semple,et al.  Amalgamating source trees with different taxonomic levels. , 2013, Systematic biology.

[45]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[46]  Nicholas H. Putnam,et al.  The Genome of the Ctenophore Mnemiopsis leidyi and Its Implications for Cell Type Evolution , 2013, Science.

[47]  Keith A. Crandall,et al.  Lost Branches on the Tree of Life , 2013, PLoS biology.

[48]  Peter Murray-Rust,et al.  AMI-diagram: Mining Facts from Images , 2014, D Lib Mag..

[49]  S. Baldauf,et al.  An Alternative Root for the Eukaryote Tree of Life , 2014, Current Biology.

[50]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[51]  Andrew F. Magee,et al.  The Dawn of Open Access to Phylogenetic Data , 2014, PloS one.

[52]  I. C. Winder,et al.  Reticulate evolution and the human past: an anthropological perspective , 2014, Annals of human biology.

[53]  David Fernández-Baca,et al.  Constructing and Employing Tree Alignment Graphs for Phylogenetic Synthesis , 2015, AlCoB.

[54]  Karen Cranston,et al.  Phylesystem: a git-based data store for community-curated phylogenetic estimates , 2015, Bioinform..

[55]  Filipa L. Sousa,et al.  Origins of major archaeal clades correspond to gene acquisitions from bacteria , 2014, Nature.