Large-scale analysis of conserved rare codon clusters suggests an involvement in co-translational molecular recognition events

MOTIVATION An increasing amount of evidence from experimental and computational analysis suggests that rare codon clusters are functionally important for protein activity. Most of the studies on rare codon clusters were performed on a limited number of proteins or protein families. In the present study, we present the Sherlocc program and how it can be used for large scale protein family analysis of evolutionarily conserved rare codon clusters and their relation to protein function and structure. This large-scale analysis was performed using the whole Pfam database covering over 70% of the known protein sequence universe. Our program Sherlocc, detects statistically relevant conserved rare codon clusters and produces a user-friendly HTML output. RESULTS Statistically significant rare codon clusters were detected in a multitude of Pfam protein families. The most statistically significant rare codon clusters were predominantly identified in N-terminal Pfam families. Many of the longest rare codon clusters are found in membrane-related proteins which are required to interact with other proteins as part of their function, for example in targeting or insertion. We identified some cases where rare codon clusters can play a regulating role in the folding of catalytically important domains. Our results support the existence of a widespread functional role for rare codon clusters across species. Finally, we developed an online filter-based search interface that provides access to Sherlocc results for all Pfam families. AVAILABILITY The Sherlocc program and search interface are open access and are available at http://bcb.med.usherbrooke.ca

[1]  Patricia L Clark,et al.  Increased incidence of rare codon clusters at 5' and 3' gene termini:implications for function , 2010, BMC Genomics.

[2]  Toshimichi Ikemura,et al.  Codon usage tabulated from international DNA sequence databases: status for the year 2000 , 2000, Nucleic Acids Res..

[3]  Patricia L. Clark,et al.  Rare Codons Cluster , 2008, PloS one.

[4]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[5]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[6]  Etsuko N. Moriyama,et al.  Codon Usage Bias and tRNA Abundance in Drosophila , 1997, Journal of Molecular Evolution.

[7]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[8]  D M Burns,et al.  Rare codons in E. coli and S. typhimurium signal sequences , 1985, FEBS letters.

[9]  M. Jennings,et al.  Experimental confirmation of a key role for non-optimal codons in protein export. , 2007, Biochemical and biophysical research communications.

[10]  Marco Gartmann,et al.  α-Helical nascent polypeptide chains visualized within distinct regions of the ribosomal exit tunnel , 2010, Nature Structural &Molecular Biology.

[11]  Jaime Prilusky,et al.  FoldIndex copyright: a simple tool to predict whether a given protein sequence is intrinsically unfolded , 2005, Bioinform..

[12]  F. Hartl,et al.  The dynamic tunnel , 2004, Nature Structural &Molecular Biology.

[13]  M Zama Discontinuous translation and mRNA secondary structure. , 1995, Nucleic acids symposium series.

[14]  Martijn A. Huynen,et al.  Clustering of Codons with Rare Cognate tRNAs in Human Genes Suggests an Extra Level of Expression Regulation , 2009, PLoS genetics.

[15]  R. Lloubès,et al.  Translation is a non-uniform process. Effect of tRNA availability on the rate of elongation of nascent polypeptide chains. , 1984, Journal of molecular biology.

[16]  K. V. van Wijk,et al.  Co-translational Assembly of the D1 Protein into Photosystem II* , 1999, The Journal of Biological Chemistry.

[17]  Peng Wang,et al.  Inserting membrane proteins: the YidC/Oxa1/Alb3 machinery in bacteria, mitochondria, and chloroplasts. , 2011, Biochimica et biophysica acta.

[18]  T. Ikemura Codon usage and tRNA content in unicellular and multicellular organisms. , 1985, Molecular biology and evolution.

[19]  A. Pavesi,et al.  Transfer RNA gene redundancy and translational selection in Saccharomyces cerevisiae. , 1997, Journal of molecular biology.

[20]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[21]  Rolf Apweiler,et al.  E-MSD: an integrated data resource for bioinformatics. , 2004, Nucleic acids research.

[22]  I. Beacham,et al.  Coupling between codon usage, translation and protein export in Escherichia coli , 2011, Biotechnology journal.

[23]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[24]  R. Ehrlich,et al.  Ribosome traffic in E. coli and regulation of gene expression. , 2000, Journal of theoretical biology.

[25]  C. Kurland,et al.  Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. , 1996, Journal of molecular biology.

[26]  E. G. Shpaer The secondary structure of mRNAs from Escherichia coli: its possible role in increasing the accuracy of translation , 1985, Nucleic Acids Res..

[27]  I. Adzhubei,et al.  Nonuniform size distribution of nascent globin peptides, evidence for pause localization sites, and a cotranslational protein-folding model , 1991, Journal of protein chemistry.

[28]  Klaus Schulten,et al.  Structural Insight into Nascent Polypeptide Chain–Mediated Translational Stalling , 2009, Science.

[29]  Edward N Trifonov,et al.  Distribution of Rare Triplets Along mRNA and Their Relation to Protein Folding , 2002, Journal of biomolecular structure & dynamics.

[30]  A. Komar,et al.  Synonymous codon substitutions affect ribosome traffic and protein folding during in vitro translation , 1999, FEBS letters.

[31]  F. Képès,et al.  The "+70 pause": hypothesis of a translational control of membrane protein assembly. , 1996, Journal of molecular biology.

[32]  A. Komar,et al.  Kinetics of translation of γB crystallin and its circularly permutated variant in an in vitro cell‐free system: possible relations to codon distribution and protein folding , 1995, FEBS letters.

[33]  Jianli Lu,et al.  Electrostatics in the ribosomal tunnel modulate chain elongation rates. , 2008, Journal of molecular biology.

[34]  L. Eichacker,et al.  Assembly of the D1 Precursor in Monomeric Photosystem II Reaction Center Precomplexes Precedes Chlorophyll a–Triggered Accumulation of Reaction Center II in Barley Etioplasts , 1999, Plant Cell.

[35]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Lippincott-Schwartz,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S8 Table S1 Movies S1 to S3 a " Silent " Polymorphism in the Mdr1 Gene Changes Substrate Specificity Corrected 30 November 2007; See Last Page , 2022 .

[37]  Charlotte M Deane,et al.  The imprint of codons on protein structure , 2011, Biotechnology journal.

[38]  I. Beacham,et al.  Whole genome analysis reveals a high incidence of non-optimal codons in secretory signal sequences of Escherichia coli. , 2004, Biochemical and biophysical research communications.

[39]  D. Kanduc,et al.  Changes of tRNA population during compensatory cell proliferation: differential expression of methionine-tRNA species. , 1997, Archives of biochemistry and biophysics.

[40]  P Dessen,et al.  The PAUSE software for analysis of translational control over protein targeting: application to E. nidulans membrane proteins. , 2000, Gene.

[41]  P Argos,et al.  Ribosome‐mediated translational pause and protein domain organization , 1996, Protein science : a publication of the Protein Society.

[42]  A. Komar,et al.  A pause for thought along the co-translational folding pathway. , 2009, Trends in biochemical sciences.

[43]  A. Brown,et al.  The efficiency of folding of some proteins is increased by controlled rates of translation in vivo. A hypothesis. , 1987, Journal of molecular biology.

[44]  L. Duret,et al.  tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. , 2000, Trends in genetics : TIG.

[45]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[46]  Alessandra Carbone,et al.  Codon adaptation index as a measure of dominating codon bias , 2003, Bioinform..

[47]  Shu-ou Shan,et al.  Molecular Mechanism of Co‐translational Protein Targeting by the Signal Recognition Particle , 2011, Traffic.

[48]  C. Kurland,et al.  Codon usage determines translation rate in Escherichia coli. , 1989, Journal of molecular biology.

[49]  P Argos,et al.  Protein secondary structural types are differentially coded on messenger RNA , 1996, Protein science : a publication of the Protein Society.

[50]  Charlotte M. Deane,et al.  Synonymous codon usage influences the local protein structure observed , 2010, Nucleic acids research.

[51]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[52]  Ruth Nussinov,et al.  Synonymous mutations and ribosome stalling can lead to altered folding pathways and distinct minima. , 2008, Journal of molecular biology.

[53]  J E Mullet,et al.  Ribosomes pause at specific sites during synthesis of membrane-bound chloroplast reaction center protein D1. , 1991, The Journal of biological chemistry.

[54]  J. Beckmann,et al.  FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. , 2005, Bioinformatics.

[55]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[56]  Ricardo Ehrlich,et al.  Silent mutations affect in vivo protein folding in Escherichia coli. , 2002, Biochemical and biophysical research communications.

[57]  Jürgen Pleiss,et al.  Analysis of the distribution of functionally relevant rare codons , 2008, BMC Genomics.

[58]  Milana Frenkel-Morgenstern,et al.  Genes adopt non-optimal codon usage to generate cell cycle-dependent oscillations in protein levels , 2012, Molecular systems biology.

[59]  A. Komar,et al.  Birth, life and death of nascent polypeptide chains , 2011, Biotechnology journal.

[60]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[61]  Miroslaw Cygler,et al.  Crystal structure of Proteus vulgaris chondroitin sulfate ABC lyase I at 1.9A resolution. , 2003, Journal of molecular biology.