Species Choice for Comparative Genomics: Being Greedy Works

Several projects investigating genetic function and evolution through sequencing and comparison of multiple genomes are now underway. These projects consume many resources, and appropriate planning should be devoted to choosing which species to sequence, potentially involving cooperation among different sequencing centres. A widely discussed criterion for species choice is the maximisation of evolutionary divergence. Our mathematical formalization of this problem surprisingly shows that the best long-term cooperative strategy coincides with the seemingly short-term “greedy” strategy of always choosing the next best single species. Other criteria influencing species choice, such as medical relevance or sequencing costs, can also be accommodated in our approach, suggesting our results' broad relevance in scientific policy decisions.

[1]  D. Haussler,et al.  Computational screening of conserved genomic DNA in search of functional noncoding elements , 2005, Nature Methods.

[2]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[3]  R M May,et al.  Extinction and the loss of evolutionary history. , 1997, Science.

[4]  S. O’Brien,et al.  Genomics. On choosing mammalian genomes for sequencing. , 2001, Science.

[5]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[6]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[7]  Clifford Stein,et al.  Introduction to algorithms. Chapter 16. 2nd Edition , 2001 .

[8]  S. Batzoglou,et al.  Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. , 2003, Genome research.

[9]  D. Haussler,et al.  Reconstructing large regions of an ancestral mammalian genome in silico. , 2004, Genome research.

[10]  R. Durbin,et al.  The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics , 2003, PLoS biology.

[11]  Diana J. Kao,et al.  Parallel adaptive radiations in two major clades of placental mammals , 2001, Nature.

[12]  Mike Steel,et al.  Phylogenetic diversity and the greedy algorithm. , 2005, Systematic biology.

[13]  Lior Pachter,et al.  Subtree power analysis and species selection for comparative genomics , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[15]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[16]  Jean L. Chang,et al.  An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[18]  Inna Dubchak,et al.  Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. , 2005, Genome research.

[19]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[20]  S. O’Brien,et al.  On Choosing Mammalian Genomes for Sequencing , 2001, Science.