On the optimality of the genetic code
暂无分享,去创建一个
Contemporary genetic code (GK) is known more than two decades (Nirenberg et a l . , 1965). In this paper we shall t ry to answer the question about the considerations caused this choice by direct ing our attention to some combinatorial and information character ist ics of GK. The following features of GK are known (Dayhoff and McLaughlin, 1972): a) each amino acid is coded by t r i p l e t ( s ) , b) the same amino acid can be coded by several t r i p l e t s , c) the same t r i p l e t can not code for more than one amino acid, d) each amino acid sequence is coded by a t r i p l e t sequence, e) triplet-sequence decoding proceeds consequently: t r i p l e t af ter t r i p l e t . Let A(t)={a~,. . . ,a§ and B={b I . . . . ,bA~ are alphabets, and S(B) is a set of t r ip lGt~ forme~'by B-alphabet le~%ers. Let k= (k I . . . . . k t) is a set of subsets of S(B) sat isfying the conditions: I) k~nki=~ i~j 2) i~ik~= S(B). From I) and 2) i t follows that i~lk~l=lS(B)l.'He~elk~l denotes the number of elements in set k i . I t is clear that by any' isomorphism F(k): k > A(t) one can define a code C(k) sat isfying conditions a), b), c) , d), and e). Let K(A(t),B) denotes the set of a l l such codes. Let us recall that for t= 20 GK E K(A(t),B). I f C(k) E K(A(t),B) and A(t) is an alphabet for which the probab i l i t i e s p. of appearance of a. in the messages are known,tthen the number of ~ays of enco~ing~a m~ssage of length M by C(k) is(~lk~l Pi) M, i.e.maximization of i611kJ~i on K(A(t),B) leads to the code "C(g) a l lowing maximum number ~f ~ncoding for a message of arb i t rary length M. This is equiyalent to the following discrete optimization problem: ( I ) max{i~llkil~i : ~(k) E K(A(t),B)] which is equivalent to (2) max{i~llk~l~i :~11k~l =IS(B)[, lk~l~1} in the sense that i t characterizes on~ ~he capaclty of set" k i without consideration of the i r inner arr~nge~ent.r The continuous variant of (2) is the problem (3) max{i~ixiPi :i~4x i =IS(B)[, x i~O} . The solution of (3) is a vector ~ ( ~ , , . . . , ~ t ~ for which ~.= p. IS(B)I, i= 1 , . . . , t . For the case Of GK (t= 26 and p i > i = I , . . . , ~0 a~e probabi l i t ies of the appearance of the amino acids Barker et a l . , 1984) ~ shows a good approximation for the capacity of synonym sets coding for the same amino acid (Table I ) . Obviously, in case of "good" inner arrangement of set k i , simple mutations w i l l change the sense of the coded message rarely.
[1] M Nirenberg,et al. RNA codewords and protein synthesis, VII. On the general nature of the RNA code. , 1965, Proceedings of the National Academy of Sciences of the United States of America.
[2] A. Figureau,et al. The logic of the genetic code: Synonyms and optimality against effects of mutations , 2004, Origins of life.