Protein structure and the sequential structure of mRNA: α‐Helix and β‐sheet signals at the nucleotide level

A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N‐ and C‐termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S‐like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome. © 1996 Wiley‐Liss, Inc.

[1]  C. Kurland,et al.  Codon usage determines translation rate in Escherichia coli. , 1989, Journal of molecular biology.

[2]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[3]  Jan Mrázek,et al.  Occurrence of nucleotide triplets in genes and secondary structure of the coded proteins , 1987 .

[4]  E. Trifonov Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. , 1987, Journal of molecular biology.

[5]  G. Rose,et al.  Helix signals in proteins. , 1988, Science.

[6]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[7]  W. Fiers,et al.  Folding of the MS2 coat protein in Escherichia coli is modulated by translational pauses resulting from mRNA secondary structure and codon usage: a hypothesis. , 1993, Journal of theoretical biology.

[8]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[9]  H. Noller Ribosomal RNA and translation. , 1991, Annual review of biochemistry.

[10]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[11]  P. Y. Chou,et al.  Empirical predictions of protein conformation. , 1978, Annual review of biochemistry.

[12]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[13]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[14]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[15]  R. Weiss,et al.  Towards a genetic dissection of the basis of triplet decoding, and its natural subversion: programmed reading frame shifts and hops. , 1991, Annual review of genetics.

[16]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[17]  J. Lake,et al.  DNA-hybridization electron microscopy. Localization of five regions of 16 S rRNA on the surface of 30 S ribosomal subunits. , 1990, Journal of molecular biology.

[18]  E N Trifonov,et al.  Recognition of correct reading frame by the ribosome , 1992, Biochimie.

[19]  I. Adzhubei,et al.  Nonuniform size distribution of nascent globin peptides, evidence for pause localization sites, and a cotranslational protein-folding model , 1991, Journal of protein chemistry.

[20]  Fred E. Cohen,et al.  β-Breakers: An aperiodic secondary structure , 1991 .

[21]  Malcolm J. McGregor,et al.  Prediction of β-turns in proteins using neural networks , 1989 .

[22]  S. Brunak,et al.  Protein secondary structure and homology by neural networks The α‐helices in rhodopsin , 1988 .

[23]  G. von Heijne,et al.  Translation rate modification by preferential codon usage: intragenic position effects. , 1987, Journal of Theoretical Biology.

[24]  M Bulmer Codon usage and secondary structure of MS2 phage RNA. , 1989, Nucleic acids research.

[25]  A. Brown,et al.  Protein folding within the cell is influenced by controlled rates of polypeptide elongation. , 1992, Journal of molecular biology.

[26]  Paul M. Sharp,et al.  Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes , 1986, Nucleic Acids Res..

[27]  J. Shine,et al.  The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[28]  G. Rose,et al.  Hydrogen bonding, hydrophobicity, packing, and protein folding. , 1993, Annual review of biophysics and biomolecular structure.

[29]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[30]  S Pongor,et al.  The SBASE domain library: a collection of annotated protein segments. , 1993, Protein engineering.

[31]  S. Brunak,et al.  Neural network model of the genetic code is strongly correlated to the GES scale of amino acid transfer free energies. , 1994, Journal of molecular biology.

[32]  C. Woese,et al.  Conservation of primary structure in 16S ribosomal RNA , 1975, Nature.

[33]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[34]  W. Kabsch,et al.  How good are predictions of protein secondary structure? , 1983, FEBS letters.

[35]  Søren Brunak,et al.  Doing Sequence Analysis by Inspecting the Order in which Neural Networks Learn , 1993 .

[36]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[37]  J. Maizel,et al.  Identification of unusual RNA folding patterns encoded by bacteriophage T4 gene 60 , 1993, Gene.

[38]  H. Noller Topography of 16S RNA in 30S ribosomal subunits. Nucleotide sequences and location of sites of reaction with kethoxal. , 1974, Biochemistry.

[39]  P. Walter,et al.  Discrete nascent chain lengths are required for the insertion of presecretory proteins into microsomal membranes , 1993, The Journal of cell biology.

[40]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[41]  G. Stormo,et al.  Translational initiation in prokaryotes. , 1981, Annual review of microbiology.

[42]  J. Sambrook,et al.  Protein folding in the cell , 1992, Nature.

[43]  T Gojobori,et al.  Codon usage tabulated from the GenBank Genetic Sequence Data. , 1988, Nucleic acids research.

[44]  P. Tekamp-Olson,et al.  The isolation, characterization, and sequence of the pyruvate kinase gene of Saccharomyces cerevisiae. , 1983, The Journal of biological chemistry.

[45]  J. Janin,et al.  Structural domains in proteins and their role in the dynamics of protein function. , 1983, Progress in biophysics and molecular biology.

[46]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[47]  S. Hayward,et al.  Limits on α‐helix prediction with neural network models , 1992 .

[48]  P. Sharp,et al.  Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons. , 1986, Nucleic acids research.

[49]  R. Weiss,et al.  Ribosome gymnastics—Degree of difficulty 9.5, style 10.0 , 1990, Cell.

[50]  A. Brown,et al.  The efficiency of folding of some proteins is increased by controlled rates of translation in vivo. A hypothesis. , 1987, Journal of molecular biology.

[51]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[52]  G. Candelas,et al.  Features of the cell-free translation of a spider fibroin mRNA. , 1989, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[53]  C. Zhang,et al.  A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences. , 1994, Journal of molecular biology.

[54]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[55]  P. Wollenzien,et al.  Sites of contact of mRNA with 16S rRNA and 23S rRNA in the Escherichia coli ribosome. , 1991, Biochemistry.

[56]  P. Wollenzien,et al.  Arrangement of messenger RNA on Escherichia coli ribosomes with respect to 10 16S rRNA cross-linking sites. , 1994, Biochemistry.