Computing Phylogenetic Diversity for Split Systems

In conservation biology, it is a central problem to measure, predict, and preserve biodiversity as species face extinction. In 1992, Faith proposed measuring the diversity of a collection of species in terms of their relationships on a phylogenetic tree and using this information to identify collections of species with high diversity. Here, we are interested in some variants of the resulting optimization problem that arise when considering species whose evolution is better represented by a network rather than a tree. More specifically, we consider the problem of computing phylogenetic diversity relative to a split system on a collection of species of size n. We show that, for general split systems, this problem is NP-hard. In addition, we provide some efficient algorithms for some special classes of split systems, in particular presenting an optimal O(n) time algorithm for phylogenetic trees and an O(n log n + nk) time algorithm for choosing an optimal subset of size k relative to a circular split system.

[1]  Takeaki Uno,et al.  A Linear Time Algorithm for Finding a k-Tree Core , 1997, J. Algorithms.

[2]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[3]  V. Chepoi,et al.  A Note on Circular Decomposable Metrics , 1998 .

[4]  Alok Aggarwal,et al.  Geometric applications of a matrix-searching algorithm , 1987, SCG '86.

[5]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[6]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[7]  G. Barker Phylogenetic diversity: a quantitative framework for measurement of priority and achievement in biodiversity conservation , 2002 .

[8]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[9]  A. Haeseler,et al.  Phylogenetic Diversity on Split Networks , 2007 .

[10]  D. Faith,et al.  Phylogenetic diversity (PD) and biodiversity conservation: some bioinformatics challenges , 2006, Evolutionary bioinformatics online.

[11]  Nick Goldman,et al.  Species Choice for Comparative Genomics: Being Greedy Works , 2005, PLoS genetics.

[12]  Viggo Kann,et al.  Hardness of Approximating Problems on Cubic Graphs , 1997, CIAC.

[13]  Mike Steel,et al.  Phylogenetic diversity and the greedy algorithm. , 2005, Systematic biology.

[14]  D. Hochbaum Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems , 1996 .

[15]  Charles Semple,et al.  Optimizing phylogenetic diversity under constraints. , 2007, Journal of theoretical biology.

[16]  Mike Steel,et al.  Maximizing phylogenetic diversity in biodiversity conservation: Greedy solutions to the Noah's Ark problem. , 2006, Systematic biology.

[17]  Kevin J. Gaston,et al.  Maximising phylogenetic diversity in the selection of networks of conservation areas , 2002 .

[18]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[19]  P. Lewis,et al.  Unearthing the molecular phylodiversity of desert soil green algae (Chlorophyta). , 2005, Systematic biology.

[20]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[21]  R. Chandrasekaran,et al.  Location on Tree Networks: P-Centre and n-Dispersion Problems , 1981, Math. Oper. Res..

[22]  Nick Goldman,et al.  Resource-aware taxon selection for maximizing phylogenetic diversity. , 2007, Systematic biology.

[23]  M. Steel,et al.  Phylogenetic diversity: from combinatorics to ecology , 2007 .

[24]  O. Colin Stine,et al.  Multilocus Sequence Typing for Characterization of Clinical and Environmental Salmonella Strains , 2002, Journal of Clinical Microbiology.

[25]  Barbara R. Holland,et al.  Evolutionary analyses of large data sets: Trees and beyond , 2001 .

[26]  G. Ziegler Lectures on Polytopes , 1994 .

[27]  David Bryant,et al.  Linearly independent split systems , 2007, Eur. J. Comb..

[28]  David Eppstein,et al.  Finding minimum areak-gons , 1992, Discret. Comput. Geom..

[29]  Bui Quang Minh,et al.  Phylogenetic diversity within seconds. , 2006, Systematic biology.

[30]  Ryuhei Uehara,et al.  Efficient Algorithms for the Longest Path Problem , 2004, ISAAC.

[31]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[32]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[33]  A. Dress,et al.  A canonical decomposition theory for metrics on a finite set , 1992 .

[34]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[35]  Henning Schwöbbermeyer,et al.  A Comparative Analysis of Biodiversity Measures , 1999, ECAL.

[36]  Madhu Sudan,et al.  The minimum latency problem , 1994, STOC '94.

[37]  C. Semple,et al.  Nature Reserve Selection Problem: A Tight Approximation Algorithm , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Vincent Moulton,et al.  Using consensus networks to visualize contradictory evidence for species phylogeny. , 2004, Molecular biology and evolution.

[39]  M. Weitzman The Noah's Ark Problem , 1998 .

[40]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[41]  Kenneth Kalmanson Edgeconvex Circuits and the Traveling Salesman Problem , 1975, Canadian Journal of Mathematics.

[42]  Leonidas J. Guibas,et al.  Finding extremal polygons , 1982, STOC '82.