Human coding and noncoding DNA: compositional correlations.

As the correlations between GC levels in third codon positions (GC3) and intergenic sequence GC levels can be used to assess the distribution of genes in the human genome, they were studied in detail. Previous work from our laboratory has demonstrated the existence of linear correlations between GC levels of exons, introns, third codon positions, 5' flanking regions of genes, and long genomic DNA sequences (> or = 10 kb) or DNA molecules (50-100 kb) in which the genes are embedded. The present study confirms and extends the previous results using a larger set of data. Furthermore, an analysis of 4270 human genomic DNA and cDNA sequences has allowed us to confirm a correlation of GC3 against GC1+2. Recent additions to the sequence database have also allowed separate analyses of the 5' flanking regions of CpG island and non-CpG island genes as well as analyses of 3' flanking regions, which suggest that the GC levels of 3' flanking regions are closer to those of intergenic DNA than are those of other regions of genes.