In order to analyse the genetic code, the distribution of the 64 trinucleotides w (words of 3 letters on the gene alphabet {A,C,G,T}, w∈τ={AAA,⋯,TTT}) in the prokaryotic protein coding genes (words of large sizes) is studied with autocorrelation functions. The trinucleotides wp can be read in 3 frames p (p=0: reference frame, p=1: reference frame shifted by 1 letter, p=2: reference frame shifted by 2 letters) in coding genes. Then, the autocorrelation function wp(N)iw′ analyses the occurrence probability of the i-motif wp(N)iw′, i.e. 2 trinucleotides wp in frame p and w′ in any frame (w,w′∈ τ) which are separated by any i bases N (N=A, C, G or T). The 642×3=12288 autocorrelation functions applied to the prokaryotic protein coding genes are almost all non-random and have a modulo 3 periodicity among the 3 following types: 0 modulo 3, 1 modulo 3 and 2 modulo 3. The classification of 12288 i-motifs wp(N)iw′ according to the type of periodicity implies a constant preferential occurrence frame for w′ independent of w and p. Three sub-sets of trinucleotides are identified: 22 trinucleotides in frame 0 forming the subset τ0={AAA, AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC, TTT} and 21 trinucleotides in each of the frames 1 and 2 forming the sub-sets τ1 and τ2 respectively. Except for AAA, CCC, GGG and TTT, the sub-sets τ1 and τ2 are generated by a circular permutation P of τ0: P(τ0)=τ1 and P(τ1)=τ2. Furthermore, the complementarity property ∁ of the DNA double helix (i.e. ∁(A)=T, ∁(C)=G, ∁(G)=C, ∁(T)=A and if w=l1l2l3 then ∁(w)=∁(l3)∁(l2)∁(l1) with l1, l2, l3∈{A,C,G,T}) is observed in these 3 sub-sets: ∁(τ0)=τ0, ∁(τ1)=τ2 and ∁(τ2)=τ1.
[1]
C J Michel,et al.
Identification and simulation of shifted periodicities common to protein coding genes of eukaryotes, prokaryotes and viruses.
,
1995,
Journal of theoretical biology.
[2]
F H Crick,et al.
CODES WITHOUT COMMAS.
,
1957,
Proceedings of the National Academy of Sciences of the United States of America.
[3]
F. Crick,et al.
Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid
,
1953,
Nature.
[4]
D. Arquès,et al.
Periodicities in coding and noncoding regions of the genes.
,
1990,
Journal of theoretical biology.
[5]
C J Michel,et al.
A purine-pyrimidine motif verifying an identical presence in almost all gene taxonomic groups.
,
1987,
Journal of theoretical biology.
[6]
C J Michel,et al.
A model of DNA sequence evolution.
,
1990,
Bulletin of mathematical biology.
[7]
T H Jukes,et al.
Amino acid composition of proteins: Selection against the genetic code.
,
1975,
Science.
[8]
Marshall W. Nirenberg,et al.
The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides
,
1961,
Proceedings of the National Academy of Sciences.