Phylogenetic Distance Computation Using CUDA

Some phylogenetic comparative analyses rely on simulation procedures that use a large number of phylogenetic trees to estimate evolutionary correlations. Because of the computational burden of processing hundreds of thousands of trees, unless this procedure is efficiently implemented, the analyses are of limited applicability. In this paper, we present a highly parallel and efficient implementation for calculating phylogenetic distances. By using the power of GPU computing and a massive number of threads we are able to achieve performance gains up to 243x when compared to a sequential implementation of the same procedures. New data structures and algorithms are also presented so as to efficiently process irregular pointer-based data structures such as trees. In particular, a GPU-based parallel implementation of the lowest common ancestor (LCA) problem is presented. Moreover, the implementation makes intensive use of bitmaps to efficiently encode paths to the tree nodes, and optimize memory transactions by working with data structures that favors coalesced memory accesses. Our results open up the possibility of dealing with large datasets in evolutionary and ecological analyses.

[1]  Douglas L Altshuler,et al.  Phylogenetic systematics and biogeography of hummingbirds: Bayesian and maximum likelihood analyses of partitioned data and selection of an appropriate partitioning strategy. , 2007, Systematic biology.

[2]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[3]  Daniel L. Ayres,et al.  BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics , 2011, Systematic biology.

[4]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[5]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[6]  Gaston H. Gonnet,et al.  LATIN 2000: Theoretical Informatics: 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000 Proceedings , 2000, Lecture Notes in Computer Science.

[7]  Albert Y. Zomaya,et al.  High‐Performance Phylogeny Reconstruction Under Maximum Parsimony , 2006 .

[8]  Kate E. Jones,et al.  The delayed rise of present-day mammals , 1990, Nature.

[9]  Campbell O. Webb,et al.  Bioinformatics Applications Note Phylocom: Software for the Analysis of Phylogenetic Community Structure and Trait Evolution , 2022 .

[10]  Daniel Merkle,et al.  Phylogenetic Parameter Estimation on COWs , 2005 .

[11]  O. von Helversen,et al.  Evolution of nectarivory in phyllostomid bats (Phyllostomidae Gray, 1825, Chiroptera: Mammalia) , 2010, BMC Evolutionary Biology.

[12]  Alfred V. Aho,et al.  On Finding Lowest Common Ancestors in Trees , 1976, SIAM J. Comput..

[13]  Albert Y. Zomaya Parallel Computing for Bioinformatics and Computational Biology , 2005 .

[14]  Mathieu Fourment,et al.  PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change , 2006, BMC Evolutionary Biology.

[15]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[16]  Alexandros Stamatakis Parallel and Distributed Computation of Large Phylogenetic Trees , 2005 .

[17]  A. Pyron,et al.  A large-scale phylogeny of Amphibia including over 2800 species, and a revised classification of extant frogs, salamanders, and caecilians. , 2011, Molecular phylogenetics and evolution.