Nonrandom utilization of codon pairs in Escherichia coli.

We have analyzed protein-coding sequences of Escherichia coli and find that codon-pair utilization is highly biased, reflecting overrepresentation or underrepresentation of many pairs compared with their random expectations. This effect is over and above that contributed by nonrandomness in the use of amino acid pairs, which itself is highly evident; it is much weaker when nonadjacent codon pairs are examined and virtually disappears when pairs separated by two or three intervening codons are evaluated. There appears to be a high degree of directionality in this bias: any codon that participates in many nonrandom pairs tends to make both over- and underrepresented pairs, but preferentially as a left- or right-hand member. We show a relationship between codon-pair utilization patterns and levels of gene expression: genes encoding proteins expressed at high levels tend to contain more abundant, but more highly underrepresented, codon pairs, relative to genes expressed at low levels. The nonrandom utilization of codon pairs may be a consequence of their effects on translational efficiency, which in turn may be related to the compatibility of adjacent aminoacyl-tRNA isoacceptors at the A and P sites of a translating ribosome.