Metagenomics and the global ocean survey: what's in it for us, and why should we care?

Recently, a special Oceanic Metagenomics Collection of articles from the J Craig Venter Institute was published in PLoS Biology, available at: http:// collections.plos.org/plosbiology/gos-2007. At first glance, the publication represents a very large (and very welcome) addition of data to the nascent field of marine microbial metagenomics. These data, consisting of more than 7.7 million sequencing reads (46 billion base pairs of sequence), reveal more new genes, more new proteins, more diversity and a more complex ocean than might have been thought: yet they do not begin to touch the real complexity of the ocean ecosystem(s). The data are gathered from 41 sites, primarily marine, covering a transect that includes a sample about every 330 km for more than 8000 km, from the North Atlantic, southwards along the eastern edge of North America, through the Panama Canal, and onward towards the South Pacific. In addition, there is some extensive coverage near and around the Galapogos Islands. Included in the dataset are previously studied samples from the Sargasso (Venter et al., 2004). A deeper look, however, reveals that these impressive numbers are the tip of an intellectual iceberg of fascinating inconsistencies with regard to marine microbial diversity. Indeed, it may well be that what is not in the dataset may offer opportunities for future studies that transcend the opportunities lying in the dataset itself. To understand what is not there, one needs to keep in mind where and how the samples were collected: these are all nearsurface (within a few meters) samples that were filtered multiple times to yield a size fraction in the 0.2–0.8mm range. Thus, the sample can be aptly characterized as the near-surface marine planktonic niche, consisting mostly of unattached, single cells. Other organisms should have been removed on the larger 0.8 mm filters, which remain as a resource for further study. As for what is contained in the dataset, there is something for almost everyone. Rusch et al. (2007) lead off with a synopsis of the gene data – new genes galore, new phylotypes galore and the conclusion that in this niche there is still to be found an impressive array of diversity at both the taxonomic and biochemical levels. This being said, however, the dominant species are remarkably few in number. If one simply removes all ‘abundant’ species that occur at only one site, as well as those that are found only in the non-marine (hypersaline, mangrove and freshwater) sites, the number of dominant groups that characterize this marine planktonic niche decreases to about 10–20 (depending on whether you are a splitter or a grouper). This is quite remarkable, perhaps the paradox of the plankton is not a paradox at all, but is hidden in the way that microbiologists define diversity, and our understanding of what is being competed for in the so-called uniform ocean. Of these, only three (Synechococcus, Prochlorococcus and Pelagibacter ubique, a SAR-11 type) have been cultivated and have genomic sequences available. However, among these abundant species can be found an impressive array of diversity – so impressive that in no case was it possible to assemble a genome from any of them. Thus, while taxonomic/ phylogenetic diversity was quite limited, the diversity at the gene level was remarkably high, an observation fitting with several previous studies of localized sites, but apparently a general feature of the marine planktonic environment. Given these challenges, some new approaches were adopted to try and understand this immense diversity. For example, 584 sequenced genomes in finished or draft form were used for ‘fragment recruitment’ of the entire database. Remarkably, only 30% of the database revealed recruitment to any of the 584 genomes: 15% recruited to three genomes of the ‘marine planktonic niche’ (Pelagibacter, Prochlorococcus and Synechococcus), while 15% recruited to two genomes that appeared at only one site in the global ocean survey (GOS) (Shewanella and Burkholderia). In terms of understanding the nature of diversity in the marine planktonic niche, such information tells us that the sequencing of the other dominant species should be a high-priority item – one that will allow retrospective fragment The ISME Journal (2007) 1, 185–190 & 2007 International Society for Microbial Ecology All rights reserved 1751-7362/07 $30.00