ON THE IMPOSSIBILITY OF ALL OVERLAPPING TRIPLET CODES IN INFORMATION TRANSFER FROM NUCLEIC ACID TO PROTEINS.

It is a generally accepted view that nucleic acids control the synthesis of proteins, and it has been proposed more specifically that the sequence of amino acids in a polypeptide chain is determined by the order of nucleotides in riboor deoxyribonucleic acid. The problem of how this determination is effected has come to be known as the "coding" problem. The formal aspects of this problem can be investigated theoretically, and most of the work done in this field has recently been reviewed by Gamow, Rich, and Yeas.' Since there are only four different nucleotides in RNA or DNA to determine twenty different amino acids, it is clear that more than one nucleotide must be used to code for each amino acid. Most codes have been constructed on the basis that each amino acid is determined by a set of three nucleotides. Such triplet codes, however, have an excess of information, since there are sixty'four different triplets for the twenty amino acids. In Gamow's original diamond code, several triplets, chosen in a particular way, coded for any given amino acid; the code was therefore "degenerate." This code was also of the overlapping type-that is, the number of nucleotides in the nucleic acid was equal to the number of amino acids in the polypeptide chain. Gamow's diamond code does not, in fact, code for known sequences, and the same is true for the major-minor code, another overlapping triplet code, invented by L. Orgel.' These are, however, only two examples of a large number of possible codes of this type which can be obtained by choosing different ways of degenerating the triplets. To test all of these systematically is clearly impossible, and hence it is necessary to have some general theorem about such codes. The general overlapping triplet code has the following properties. (i) The coding triplets are chosen from four nucleotides, A, B, C, and D, giving sixty-four different triplets. (ii) Coding is overlapping, each triplet sharing two nucleotides with the succeeding triplet in a sequence. Thus the sequence ABCDA codes for three amino acids: ABC for the first, BCD for the second, and CDA for the third. (iii) An amino acid may be represented by more than one triplet; that is, the sixty-four triplets are degenerated into twenty sets.