Application of tetranucleotide frequencies for the assignment of genomic fragments.

A basic problem of the metagenomic approach in microbial ecology is the assignment of genomic fragments to a certain species or taxonomic group, when suitable marker genes are absent. Currently, the (G + C)-content together with phylogenetic information and codon adaptation for functional genes is mostly used to assess the relationship of different fragments. These methods, however, can produce ambiguous results. In order to evaluate sequence-based methods for fragment identification, we extensively compared (G + C)-contents and tetranucleotide usage patterns of 9054 fosmid-sized genomic fragments generated in silico from 118 completely sequenced bacterial genomes (40 982 931 fragment pairs were compared in total). The results of this systematic study show that the discriminatory power of correlations of tetranucleotide-derived z-scores is by far superior to that of differences in (G + C)-content and provides reasonable assignment probabilities when applied to metagenome libraries of small diversity. Using six fully sequenced fosmid inserts from a metagenomic analysis of microbial consortia mediating the anaerobic oxidation of methane (AOM), we demonstrate that discrimination based on tetranucleotide-derived z-score correlations was consistent with corresponding data from 16S ribosomal RNA sequence analysis and allowed us to discriminate between fosmid inserts that were indistinguishable with respect to their (G + C)-contents.

[1]  K. Borzym,et al.  Complete genome sequence of the marine planctomycete Pirellula sp. strain 1 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Jo Handelsman,et al.  Biotechnological prospects from metagenomics. , 2003, Current opinion in biotechnology.

[3]  Shigehiko Kanaya,et al.  Informatics for unveiling hidden genome signatures. , 2003, Genome research.

[4]  M. Blaser,et al.  Evolutionary implications of microbial genome tetranucleotide frequency biases. , 2003, Genome research.

[5]  E. Delong,et al.  Microbial population genomics and ecology. , 2002, Current opinion in microbiology.

[6]  Rudolf Amann,et al.  Microbial Reefs in the Black Sea Fueled by Anaerobic Oxidation of Methane , 2002, Science.

[7]  Thomas P. Curtis,et al.  Estimating prokaryotic diversity and its limits , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Sandberg,et al.  Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. , 2001, Genome research.

[9]  Marion Leclerc,et al.  Proteorhodopsin phototrophy in the ocean , 2001, Nature.

[10]  S Karlin,et al.  Genome-scale compositional comparisons in eukaryotes. , 2001, Genome research.

[11]  Olaf Pfannkuche,et al.  A marine microbial consortium apparently mediating anaerobic oxidation of methane , 2000, Nature.

[12]  E. Koonin,et al.  Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. , 2000, Environmental microbiology.

[13]  E. Koonin,et al.  Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. , 2000, Science.

[14]  S. Karlin,et al.  Predicted Highly Expressed Genes of Diverse Prokaryotic Genomes , 2000, Journal of bacteriology.

[15]  J. Handelsman,et al.  Cloning the Soil Metagenome: a Strategy for Accessing the Genetic and Functional Diversity of Uncultured Microorganisms , 2000, Applied and Environmental Microbiology.

[16]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[17]  S. Karlin,et al.  Global dinucleotide signatures and analysis of genomic heterogeneity. , 1998, Current opinion in microbiology.

[18]  K Nishikawa,et al.  Genes from nine genomes are separated into their organisms in the dinucleotide composition space. , 1998, DNA research : an international journal for rapid publication of reports on genes and genomes.

[19]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.

[20]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[21]  K. Schleifer,et al.  Phylogenetic identification and in situ detection of individual microbial cells without cultivation. , 1995, Microbiological reviews.

[22]  Sophie Schbath,et al.  Exceptional Motifs in Different Markov Chain Models for a Statistical Analysis of DNA Sequences , 1995, J. Comput. Biol..

[23]  S Karlin,et al.  Comparisons of eukaryotic genomic sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[24]  S Karlin,et al.  Heterogeneity of genomes: measures and values. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[25]  N. Goldman,et al.  Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. , 1993, Nucleic acids research.