Computing Phylogenetic Diversity for Split Systems

In conservation biology, it is a central problem to measure, predict, and preserve biodiversity as species face extinction. In 1992, Faith proposed measuring the diversity of a collection of species in terms of their relationships on a phylogenetic tree and using this information to identify collections of species with high diversity. Here, we are interested in some variants of the resulting optimization problem that arise when considering species whose evolution is better represented by a network rather than a tree. More specifically, we consider the problem of computing phylogenetic diversity relative to a split system on a collection of species of size n. We show that, for general split systems, this problem is NP-hard. In addition, we provide some efficient algorithms for some special classes of split systems, in particular presenting an optimal O(n) time algorithm for phylogenetic trees and an O(n log n + nk) time algorithm for choosing an optimal subset of size k relative to a circular split system.

[1]  C. Semple,et al.  Nature Reserve Selection Problem: A Tight Approximation Algorithm , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  David Bryant,et al.  Linearly independent split systems , 2007, Eur. J. Comb..

[3]  Nick Goldman,et al.  Resource-aware taxon selection for maximizing phylogenetic diversity. , 2007, Systematic biology.

[4]  Charles Semple,et al.  Optimizing phylogenetic diversity under constraints. , 2007, Journal of theoretical biology.

[5]  D. Faith,et al.  Phylogenetic diversity (PD) and biodiversity conservation: some bioinformatics challenges , 2006, Evolutionary bioinformatics online.

[6]  M. Steel,et al.  Phylogenetic diversity: from combinatorics to ecology , 2007 .

[7]  A. Haeseler,et al.  Phylogenetic Diversity on Split Networks , 2007 .

[8]  Bui Quang Minh,et al.  Phylogenetic diversity within seconds. , 2006, Systematic biology.

[9]  Mike Steel,et al.  Maximizing phylogenetic diversity in biodiversity conservation: Greedy solutions to the Noah's Ark problem. , 2006, Systematic biology.

[10]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[11]  P. Lewis,et al.  Unearthing the molecular phylodiversity of desert soil green algae (Chlorophyta). , 2005, Systematic biology.

[12]  Nick Goldman,et al.  Species Choice for Comparative Genomics: Being Greedy Works , 2005, PLoS genetics.

[13]  Mike Steel,et al.  Phylogenetic diversity and the greedy algorithm. , 2005, Systematic biology.

[14]  Ryuhei Uehara,et al.  Efficient Algorithms for the Longest Path Problem , 2004, ISAAC.

[15]  Vincent Moulton,et al.  Using consensus networks to visualize contradictory evidence for species phylogeny. , 2004, Molecular biology and evolution.

[16]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[17]  G. Barker Phylogenetic diversity: a quantitative framework for measurement of priority and achievement in biodiversity conservation , 2002 .

[18]  Kevin J. Gaston,et al.  Maximising phylogenetic diversity in the selection of networks of conservation areas , 2002 .

[19]  O. Colin Stine,et al.  Multilocus Sequence Typing for Characterization of Clinical and Environmental Salmonella Strains , 2002, Journal of Clinical Microbiology.

[20]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[21]  Barbara R. Holland,et al.  Evolutionary analyses of large data sets: Trees and beyond , 2001 .

[22]  Henning Schwöbbermeyer,et al.  A Comparative Analysis of Biodiversity Measures , 1999, ECAL.

[23]  M. Weitzman The Noah's Ark Problem , 1998 .

[24]  V. Chepoi,et al.  A Note on Circular Decomposable Metrics , 1998 .

[25]  Takeaki Uno,et al.  A Linear Time Algorithm for Finding a k-Tree Core , 1997, J. Algorithms.

[26]  Viggo Kann,et al.  Hardness of Approximating Problems on Cubic Graphs , 1997, CIAC.

[27]  D. Hochbaum Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems , 1996 .

[28]  G. Ziegler Lectures on Polytopes , 1994 .

[29]  Madhu Sudan,et al.  The minimum latency problem , 1994, STOC '94.

[30]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[31]  David Eppstein,et al.  Finding minimum areak-gons , 1992, Discret. Comput. Geom..

[32]  A. Dress,et al.  A canonical decomposition theory for metrics on a finite set , 1992 .

[33]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[34]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[35]  Alok Aggarwal,et al.  Geometric applications of a matrix-searching algorithm , 1987, SCG '86.

[36]  Leonidas J. Guibas,et al.  Finding extremal polygons , 1982, STOC '82.

[37]  R. Chandrasekaran,et al.  Location on Tree Networks: P-Centre and n-Dispersion Problems , 1981, Math. Oper. Res..

[38]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[39]  Kenneth Kalmanson Edgeconvex Circuits and the Traveling Salesman Problem , 1975, Canadian Journal of Mathematics.

[40]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[41]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[42]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .