A Prossible Code in the Genetic Code

In order to analyse the genetic code, the distribution of the 64 trinucleotides w (words of 3 letters on the gene alphabet {A,C,G,T}, w∈τ={AAA,⋯,TTT}) in the prokaryotic protein coding genes (words of large sizes) is studied with autocorrelation functions. The trinucleotides wp can be read in 3 frames p (p=0: reference frame, p=1: reference frame shifted by 1 letter, p=2: reference frame shifted by 2 letters) in coding genes. Then, the autocorrelation function wp(N)iw′ analyses the occurrence probability of the i-motif wp(N)iw′, i.e. 2 trinucleotides wp in frame p and w′ in any frame (w,w′∈ τ) which are separated by any i bases N (N=A, C, G or T). The 642×3=12288 autocorrelation functions applied to the prokaryotic protein coding genes are almost all non-random and have a modulo 3 periodicity among the 3 following types: 0 modulo 3, 1 modulo 3 and 2 modulo 3. The classification of 12288 i-motifs wp(N)iw′ according to the type of periodicity implies a constant preferential occurrence frame for w′ independent of w and p. Three sub-sets of trinucleotides are identified: 22 trinucleotides in frame 0 forming the subset τ0={AAA, AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC, TTT} and 21 trinucleotides in each of the frames 1 and 2 forming the sub-sets τ1 and τ2 respectively. Except for AAA, CCC, GGG and TTT, the sub-sets τ1 and τ2 are generated by a circular permutation P of τ0: P(τ0)=τ1 and P(τ1)=τ2. Furthermore, the complementarity property ∁ of the DNA double helix (i.e. ∁(A)=T, ∁(C)=G, ∁(G)=C, ∁(T)=A and if w=l1l2l3 then ∁(w)=∁(l3)∁(l2)∁(l1) with l1, l2, l3∈{A,C,G,T}) is observed in these 3 sub-sets: ∁(τ0)=τ0, ∁(τ1)=τ2 and ∁(τ2)=τ1.