On the information content of cytochrome c.

Abstract A previously published mathematical method, utilizing functionally equivalent or synonymous residues, is applied to the calculation of the information content of the genetic system required to specify at least one residue synonymous at a given site in cytochrome c . Utilizing only those residues known to be synonymous, the information content of the genetic message for the 101 sites is 373·83 bits or 3·701 bits per residue. A prescription has been written, based on a paper by Grantham (1974) which allows one to include those residues which are predicted to be synonymous due to a similarity in composition, polarity and volume. Including such residues, one finds that the information content of the genetic message for 101 sites is 298.21 bits or 2.953 bits/residue. These figures are substantially less than the maximum information content of a doublet genetic code or 4·00 bits/residue. A doublet code can code more than 16 residues but we find that cytochrome c could be coded with only 15. The suggestion of Jukes (1965) that a doublet code may have been used by vanished forms of life must be taken seriously since cytochrome c could be represented in such a way. We suggest a primeval doublet code similar to that of Jukes and show how it could have evolved to the modern degenerate triplet code. A triplet code which is non-degenerate is more subject to error than a doublet code. The redundance of a degenerate triplet code provides some error protection and therefore degeneracy must have evolved simultaneously. The vulnerability of the human and yeast cytochrome c sequences to error by single base interchange is calculated including the redundance due to the synonymous residues. The error protection due to genetic code degeneracy and to the synonymous residues was available from the earliest stages in the origin of life. These factors prove to be very much weaker than modern biological factors such as diploidy and repair processes. It is concluded that the modern triplet code was fully fixed before the appearance of these purely biological error correction mechanisms.

[1]  W. Fitch,et al.  The Properties and Amino-acid Sequence of Cytochrome c from Euglena gracilis , 1973, Nature.

[2]  W. Fitch,et al.  Amino acid sequence of a cytochrome c from the common Pacific lamprey, Entosphenus tridentatus. , 1973, Biochemistry.

[3]  J. Ramshaw,et al.  The amino acid sequence of cytochrome c from Helix aspersa Müller (garden snail). , 1972, Biochemical Journal.

[4]  E. Margoliash,et al.  Differential Binding Properties of Cytochrome c: Possible Relevance for Mitochondrial Ion Transport , 1970, Nature.

[5]  Francis Crick,et al.  The Genetic Code , 1962 .

[6]  H. P. Yockey,et al.  An application of information theory to the Central Dogma and the Sequence Hypothesis. , 1974, Journal of theoretical biology.

[7]  E. Margoliash,et al.  The primary structure of cytochrome c from the rust fungus Ustilago sphaerogena. , 1972, The Biochemical journal.

[8]  M. Wallis On the frequency of arginine in proteins and its implications for molecular evolution. , 1974, Biochemical and biophysical research communications.

[9]  F. Crick Origin of the Genetic Code , 1967, Nature.

[10]  T. Jukes,et al.  Estimation of evolutionary changes in certain homologous polypeptide chains. , 1972, Journal of molecular biology.

[11]  D. Boulter,et al.  The amino acid sequence of cytochrome c from Allium porrum L. (leek). , 1973, The Biochemical journal.

[12]  H. P. Yockey,et al.  A prescription which predicts functionally equivalent residues at given sites in protein sequences. , 1977, Journal of theoretical biology.

[13]  Martynas Yčas,et al.  The biological code , 1969 .

[14]  D. Boulter,et al.  The amino acid sequence of cytochrome c from Nigella damascena L. (love-in-a-mist). , 1973, The Biochemical journal.

[15]  A systematist looks at cytochromec , 1972, Journal of Molecular Evolution.

[16]  P. Slonimski,et al.  Formal analysis of protein sequences. I. Specific long-range constraints in pair associations of amino acids. , 1967, Journal of theoretical biology.

[17]  T H Jukes,et al.  Amino acid composition of proteins: Selection against the genetic code. , 1975, Science.

[18]  M. Scheulen,et al.  Solid‐phase Edman degradation of a protein: N‐terminal sequence of cytochromec fromCandida krusei , 1973, FEBS letters.

[19]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[20]  A. Mclachlan,et al.  Repeating sequences and gene duplication in proteins. , 1972, Journal of molecular biology.

[21]  G. W. PETTIGREW,et al.  The Amino-acid Sequence of Cytochrome c from Euglena gracilis , 1973, Nature.

[22]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[23]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[24]  T. Jukes CODING TRIPLETS AND THEIR POSSIBLE EVOLUTIONARY IMPLICATIONS. , 1965, Biochemical and biophysical research communications.

[25]  J. P. Riehm,et al.  Proteins of the thermophilic fungus Humicola lanuginosa. I. Isolation and amino acid sequence of a cytochrome C. , 1972, The Journal of biological chemistry.

[26]  A. Goldberg,et al.  Genetic Code: Aspects of Organization , 1966, Science.

[27]  R. G. Harrison,et al.  THE HARVEY LECTURES , 1935 .

[28]  E. Margoliash,et al.  Identification of missense mutants by amino acid replacements in iso-1-cytochrome c from yeast. , 1974, The Journal of biological chemistry.

[29]  T. Ohta,et al.  Amino Acid Composition of Proteins as a Product of Molecular Evolution , 1971, Science.

[30]  P. Slonimski,et al.  Formal analysis of protein sequences. II. Method for structural studies of homologous proteins amino acid substitutions in cytochromes c. , 1968, Journal of theoretical biology.

[31]  D. Boulter,et al.  The amino acid sequences of cytochrome c from four plant sources. , 1974, The Biochemical journal.

[32]  M. Sokolovsky,et al.  Primary structure of cytochrome c from the camel, Camelus dromedarius. , 1972, Biochemistry.

[33]  G. Pettigrew The amino acid sequence of a cytochromec from a protozoanCrithidia oncopelti , 1972, FEBS letters.

[34]  D. Boulter,et al.  The amino acid sequence of cytochrome c from Spinacea oleracea L. (spinach). , 1973, The Biochemical journal.

[35]  J. L. King,et al.  Non-Darwinian evolution. , 1969, Science.

[36]  H. Bremermann,et al.  A method for calculating codon frequencies in DNA. , 1972, Journal of theoretical biology.

[37]  A. L. MACKAY,et al.  Optimization of the Genetic Code , 1967, Nature.

[38]  J. Stewart,et al.  Confirmation of UAG as a nonsense codon in bakers' yeast by amino acid replacements of glutamic acid 71 in iso-1-cytochrome c. , 1973, Journal of molecular biology.

[39]  A. Mclachlan Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . , 1971, Journal of molecular biology.

[40]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .