SUNPLIN: Simulation with Uncertainty for Phylogenetic Investigations

BackgroundPhylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability.ResultsIn this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/.ConclusionWe compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

[1]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[2]  J. Losos An Approach to the Analysis of Comparative Data When a Phylogeny Is Unavailable or Incomplete , 1994 .

[3]  Douglas L Altshuler,et al.  Phylogenetic systematics and biogeography of hummingbirds: Bayesian and maximum likelihood analyses of partitioned data and selection of an appropriate partitioning strategy. , 2007, Systematic biology.

[4]  S. Gouveia,et al.  Spatially explicit analyses highlight idiosyncrasies: species extinctions and the loss of evolutionary history , 2013 .

[5]  E. Martins,et al.  Random sampling of constrained phylogenies: conducting phylogenetic analyses when the phylogeny is partially known. , 2001, Systematic biology.

[6]  Kate E. Jones,et al.  The delayed rise of present-day mammals , 1990, Nature.

[7]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[8]  J. Cotton,et al.  Tempo and Mode of Diversification of Lake Tanganyika Cichlid Fishes , 2008, PloS one.

[9]  R. FitzJohn Diversitree: comparative phylogenetic analyses of diversification in R , 2012 .

[10]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[11]  Campbell O. Webb,et al.  Bioinformatics Applications Note Phylocom: Software for the Analysis of Phylogenetic Community Structure and Trait Evolution , 2022 .

[12]  A. Pyron,et al.  A large-scale phylogeny of Amphibia including over 2800 species, and a revised classification of extant frogs, salamanders, and caecilians. , 2011, Molecular phylogenetics and evolution.

[13]  DETECTING CORRELATION BETWEEN CHARACTERS IN A COMPARATIVE ANALYSIS WITH UNCERTAIN PHYLOGENY , 2003, Evolution; international journal of organic evolution.

[14]  O. von Helversen,et al.  Evolution of nectarivory in phyllostomid bats (Phyllostomidae Gray, 1825, Chiroptera: Mammalia) , 2010, BMC Evolutionary Biology.

[15]  Richard G FitzJohn,et al.  Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. , 2009, Systematic biology.

[16]  Mathieu Fourment,et al.  PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change , 2006, BMC Evolutionary Biology.

[17]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.

[18]  B. Rannala,et al.  Taxon sampling and the accuracy of large phylogenies. , 1998, Systematic biology.