Selecting taxa to save or sequence: desirable criteria and a greedy solution.

Three desirable properties for any method of selecting a subset of evolutionary units (EUs) for conservation or for genomic sequencing are discussed. These properties are spread, stability, and applicability. We are motivated by a practical case in which the maximization of phylogenetic diversity (PD), which has been suggested as a suitable method, appears to lead to counterintuitive collections of EUs and does not meet these three criteria. We define a simple greedy algorithm (GREEDYMMD) as a close approximation to choosing the subset that maximizes the minimum pairwise distance (MMD) between EUs. GREEDYMMD satisfies our three criteria and may be a useful alternative to PD in real-world situations. In particular, we show that this method of selection is suitable under a model of biodiversity in which features arise and/or disappear during evolution. We also show that if distances between EUs satisfy the ultrametric condition, then GREEDYMMD delivers an optimal subset of EUs that maximizes both the minimum pairwise distance and the PD. Finally, because GREEDYMMD works with distances and does not require a tree, it is readily applicable to many data sets.

[1]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[2]  R. Chandrasekaran,et al.  Location on Tree Networks: P-Centre and n-Dispersion Problems , 1981, Math. Oper. Res..

[3]  C. Woese,et al.  Bacterial evolution , 1987, Microbiological reviews.

[4]  Arie Tamir,et al.  Obnoxious Facility Location on Graphs , 1991, SIAM J. Discret. Math..

[5]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[6]  P. A. Walker,et al.  DIVERSITY: a software package for sampling phylogenetic and environmental diversity. Reference and user's guide. v. 2.1. , 1994 .

[7]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[8]  D P Faith,et al.  Phylogenetic pattern and the quantification of organismal biodiversity. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[9]  N. Pace,et al.  Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[10]  The Phylogenetic Moment—A New Diversity Measure, with Procedures for Measurement and Optimisation , 1996 .

[11]  K. R. Clarke,et al.  A taxonomic distinctness index and its statistical properties , 1998 .

[12]  Barbara R. Holland,et al.  Evolutionary analyses of large data sets: Trees and beyond , 2001 .

[13]  Kevin J. Gaston,et al.  Maximising phylogenetic diversity in the selection of networks of conservation areas , 2002 .

[14]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[15]  Mike Steel,et al.  Phylogenetic diversity and the greedy algorithm. , 2005, Systematic biology.

[16]  Nick Goldman,et al.  Species Choice for Comparative Genomics: Being Greedy Works , 2005, PLoS genetics.

[17]  Kevin J. Gaston,et al.  Phylogeny and Conservation: Integrating phylogenetic diversity in the selection of priority areas for conservation: does it make a difference? , 2005 .

[18]  Charles Semple,et al.  Optimizing phylogenetic diversity under constraints. , 2007, Journal of theoretical biology.

[19]  C. Semple,et al.  Nature Reserve Selection Problem: A Tight Approximation Algorithm , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  V. Moulton,et al.  Computing Phylogenetic Diversity for Split Systems , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Andy Purvis,et al.  Phylogeny and Conservation , 2009 .