Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython

BackgroundOngoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines.ResultsWe built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source.ConclusionsBio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org.

[1]  Sean R. Eddy,et al.  ATV: display and manipulation of annotated phylogenetic , 2001, Bioinform..

[2]  Leighton Pritchard,et al.  GenomeDiagram: a python package for the visualization of large-scale genomic data , 2006, Bioinform..

[3]  A. Godzik,et al.  Surprising complexity of the ancestral apoptosis network , 2007, Genome Biology.

[4]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[5]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000, Softw. Pract. Exp..

[6]  Robert G Beiko,et al.  Telling the whole story in a 10,000-genome world , 2011, Biology Direct.

[7]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[8]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[9]  Pamela S Soltis,et al.  Darwin's abominable mystery: Insights from a supertree of the angiosperms , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Christian M. Zmasek,et al.  phyloXML: XML for evolutionary biology and comparative genomics , 2009, BMC Bioinformatics.

[11]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[12]  K. Sjölander,et al.  Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA). , 2006, Omics : a journal of integrative biology.

[13]  Pjotr Prins,et al.  BioRuby: bioinformatics software for the Ruby programming language , 2010, Bioinform..

[14]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[15]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[16]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[17]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[18]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[19]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[20]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[21]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[22]  Cymon J Cox,et al.  WASABI: an automated sequence processing system for multigene phylogenies. , 2007, Systematic biology.

[23]  Enrico Pontelli,et al.  Initial Implementation of a Comparative Data Analysis Ontology , 2009, Evolutionary bioinformatics online.

[24]  Joaquín Dopazo,et al.  ETE: a python Environment for Tree Exploration , 2010, BMC Bioinformatics.

[25]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[26]  Rutger A. Vos,et al.  BIO::Phylo-phyloinformatic analysis using perl , 2011, BMC Bioinformatics.

[27]  R. Knight,et al.  PyCogent: a toolkit for making sense from sequence , 2007, Genome Biology.

[28]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[29]  Natarajan Kannan,et al.  Structural and evolutionary divergence of eukaryotic protein kinases in Apicomplexa , 2011, BMC Evolutionary Biology.

[30]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[31]  Alexandros Stamatakis,et al.  Understanding Angiosperm Diversification Using Small and Large Phylogenetic Trees 1 , 2022 .

[32]  Hilmar Lapp,et al.  NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata , 2012, Systematic biology.

[33]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.