PUmPER: phylogenies updated perpetually

Summary: New sequence data useful for phylogenetic and evolutionary analyses continues to be added to public databases. The construction of multiple sequence alignments and inference of huge phylogenies comprising large taxonomic groups are expensive tasks, both in terms of man hours and computational resources. Therefore, maintaining comprehensive phylogenies, based on representative and up-to-date molecular sequences, is challenging. PUmPER is a framework that can perpetually construct multi-gene alignments (with PHLAWD) and phylogenetic trees (with ExaML or RAxML-Light) for a given NCBI taxonomic group. When sufficient numbers of new gene sequences for the selected taxonomic group have accumulated in GenBank, PUmPER automatically extends the alignment and infers extended phylogenetic trees by using previously inferred smaller trees as starting topologies. Using our framework, large phylogenetic trees can be perpetually updated without human intervention. Importantly, resulting phylogenies are not statistically significantly worse than trees inferred from scratch. Availability and implementation: PUmPER can run in stand-alone mode on a single server, or offload the computationally expensive phylogenetic searches to a parallel computing cluster. Source code, documentation, and tutorials are available at https://github.com/fizquierdo/perpetually-updated-trees. Contact: Fernando.Izquierdo@h-its.org Supplementary information: Supplementary Material is available at Bioinformatics online.

[1]  M. Donoghue,et al.  Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches , 2009, BMC Evolutionary Biology.

[2]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[3]  Alexandros Stamatakis,et al.  Trading Running Time for Memory in Phylogenetic Likelihood Computations , 2012, BIOINFORMATICS.

[4]  Masami Hasegawa,et al.  CONSEL: for assessing the confidence of phylogenetic tree selection , 2001, Bioinform..

[5]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[6]  J. Eisen,et al.  An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) , 2008, PloS one.

[7]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[8]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[9]  R Henrik Nilsson,et al.  Automated phylogenetic taxonomy: an example in the homobasidiomycetes (mushroom-forming fungi). , 2005, Systematic biology.

[10]  Ziheng Yang,et al.  INDELible: A Flexible Simulator of Biological Sequence Evolution , 2009, Molecular biology and evolution.

[11]  Alexandros Stamatakis,et al.  Novel Parallelization Schemes for Large-Scale Likelihood-based Phylogenetic Inference , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[12]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[13]  Alexandros Stamatakis,et al.  RAxML-Light: a tool for computing terabyte phylogenies , 2012, Bioinform..

[14]  Alexandros Stamatakis,et al.  Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees , 2011, BMC Bioinformatics.