A relationship between mRNA expression levels and protein solubility in E. coli.

Each step in the process of gene expression, from the transcription of DNA into mRNA to the folding and posttranslational modification of proteins, is regulated by complex cellular mechanisms. At the same time, stringent conditions on the physicochemical properties of proteins, and hence on the nature of their amino acids, are imposed by the need to avoid aggregation at the concentrations required for optimal cellular function. A relationship is therefore expected to exist between mRNA expression levels and protein solubility in the cell. By investigating such a relationship, we formulate a method that enables the prediction of the maximal levels of mRNA expression in Escherichia coli with an accuracy of 83% and of the solubility of recombinant human proteins expressed in E. coli with an accuracy of 86%.

[1]  Gajendra P. S. Raghava,et al.  Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein , 2005, BMC Bioinformatics.

[2]  Michele Vendruscolo,et al.  Life on the edge: a link between gene expression levels and aggregation rates of human proteins. , 2007, Trends in biochemical sciences.

[3]  A. Villaverde,et al.  Amyloid-like properties of bacterial inclusion bodies. , 2005, Journal of molecular biology.

[4]  Takashi Gojobori,et al.  Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Epstein,et al.  A functional significance for codon third bases. , 2000, Gene.

[6]  Yuan Zhang,et al.  Expression of eukaryotic proteins in soluble form in Escherichia coli. , 1998, Protein expression and purification.

[7]  Susan Idicula-Thomas,et al.  Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli , 2005, Protein science : a publication of the Protein Society.

[8]  A. Cavalli,et al.  The role of aromaticity, exposed surface, and dipole moment in determining protein aggregation rates , 2004, Protein science : a publication of the Protein Society.

[9]  Amedeo Caflisch,et al.  Organism complexity anti-correlates with proteomic beta-aggregation propensity. , 2005, Protein science : a publication of the Protein Society.

[10]  A. Komar,et al.  A pause for thought along the co-translational folding pathway. , 2009, Trends in biochemical sciences.

[11]  Salvador Ventura,et al.  Sequence determinants of protein aggregation: tools to increase protein solubility , 2005, Microbial cell factories.

[12]  Giancarlo Tonon,et al.  Structural analysis of protein inclusion bodies by Fourier transform infrared microspectroscopy. , 2006, Biochimica et biophysica acta.

[13]  J. Mata,et al.  A Network of Multiple Regulatory Layers Shapes Gene Expression in Fission Yeast , 2007, Molecular cell.

[14]  Dmitrij Frishman,et al.  Protein solubility: sequence based prediction and experimental verification , 2007, Bioinform..

[15]  Salvador Ventura,et al.  Prediction of "hot spots" of aggregation in disease-linked polypeptides , 2005, BMC Structural Biology.

[16]  J. W. Brewer,et al.  PERK mediates cell-cycle exit during the mammalian unfolded protein response. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  F. Hartl,et al.  Molecular Chaperones in the Cytosol: from Nascent Chain to Folded Protein , 2002, Science.

[18]  G. W. Hatfield,et al.  Codon Pair Utilization Biases Influence Translational Elongation Step Times (*) , 1995, The Journal of Biological Chemistry.

[19]  I. Adzhubei,et al.  Nonuniform size distribution of nascent globin peptides, evidence for pause localization sites, and a cotranslational protein-folding model , 1991, Journal of protein chemistry.

[20]  David L. Wilkinson,et al.  Predicting the Solubility of Recombinant Proteins in Escherichia coli , 1991, Bio/Technology.

[21]  R. Tjian,et al.  Transcription regulation and animal diversity , 2003, Nature.

[22]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[23]  Dat H. Nguyen,et al.  Deciphering principles of transcription regulation in eukaryotic genomes , 2006, Molecular systems biology.

[24]  Ian Humphery-Smith,et al.  Analysis of High Throughput Protein Expression in Escherichia coli* , 2006, Molecular & Cellular Proteomics.

[25]  Akinori Sarai,et al.  ProTherm, version 4.0: thermodynamic database for proteins and mutants , 2004, Nucleic Acids Res..

[26]  Tanja Kortemme,et al.  Structural mapping of protein interactions reveals differences in evolutionary pressures correlated to mRNA level and protein abundance. , 2007, Structure.

[27]  M. Gerstein,et al.  Diverse Cellular Functions of the Hsp90 Molecular Chaperone Uncovered Using Systems Approaches , 2007, Cell.

[28]  J. Hoheisel Microarray technology: beyond transcript profiling and genotype analysis , 2006, Nature Reviews Microbiology.

[29]  G. Church,et al.  RNA expression analysis using a 30 base pair resolution Escherichia coli genome array , 2000, Nature Biotechnology.

[30]  Michele Vendruscolo,et al.  Prediction of aggregation-prone regions in structured proteins. , 2008, Journal of molecular biology.

[31]  Directional mutational pressure affects the amino acid composition and hydrophobicity of proteins in bacteria , 1998 .

[32]  C. Wilke,et al.  Why highly expressed proteins evolve slowly. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  M. Kubbutat,et al.  Regulation of p53 Function and Stability by Phosphorylation , 1999, Molecular and Cellular Biology.

[34]  Amedeo Caflisch,et al.  Organism complexity anti‐correlates with proteomic β‐aggregation propensity , 2005, Protein science : a publication of the Protein Society.

[35]  S. Sprang,et al.  Affinity panning of a library of peptides displayed on bacteriophages reveals the binding specificity of BiP , 1993, Cell.

[36]  David S. Wishart,et al.  The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli , 2004, Nucleic Acids Res..

[37]  Fabrizio Chiti,et al.  Sequence and structural determinants of amyloid fibril formation. , 2006, Accounts of chemical research.

[38]  C. Dobson Protein misfolding, evolution and disease. , 1999, Trends in biochemical sciences.

[39]  C. Dobson,et al.  Rationalization of the effects of mutations on peptide andprotein aggregation rates , 2003, Nature.

[40]  Amedeo Caflisch,et al.  Prediction of aggregation rate and aggregation‐prone segments in polypeptide sequences , 2005, Protein science : a publication of the Protein Society.

[41]  L. Serrano,et al.  A comparative study of the relationship between protein structure and beta-aggregation in globular and intrinsically disordered proteins. , 2004, Journal of molecular biology.

[42]  David J. Lockhart,et al.  Expressing what's on your mind: DNA arrays and the brain , 2001, Nature Reviews Neuroscience.

[43]  H. Lehrach,et al.  A catalog of human cDNA expression clones and its application to structural genomics , 2004, Genome Biology.

[44]  C. Landry,et al.  An in Vivo Map of the Yeast Protein Interactome , 2008, Science.

[45]  M. Vendruscolo,et al.  The Zyggregator method for predicting protein aggregation propensities. , 2008, Chemical Society reviews.

[46]  M. Vendruscolo,et al.  Towards quantitative predictions in cell biology using chemical properties of proteins. , 2008, Molecular bioSystems.

[47]  P. Swain,et al.  Stochastic Gene Expression in a Single Cell , 2002, Science.

[48]  L. Aravind,et al.  Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. , 2006, Journal of molecular biology.

[49]  Xun Gu,et al.  Directional mutational pressure affects the amino acid composition and hydrophobicity of proteins in bacteria , 2004, Genetica.

[50]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[51]  Xavier Darzacq,et al.  Imaging gene expression in single living cells , 2004, Nature Reviews Molecular Cell Biology.

[52]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[53]  Michele Vendruscolo,et al.  Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases. , 2005, Journal of molecular biology.

[54]  P Argos,et al.  Ribosome‐mediated translational pause and protein domain organization , 1996, Protein science : a publication of the Protein Society.

[55]  C. Dobson,et al.  Protein misfolding, functional amyloid, and human disease. , 2006, Annual review of biochemistry.

[56]  Barrett C. Foat,et al.  Predictive modeling of genome-wide mRNA expression: from modules to molecules. , 2007, Annual review of biophysics and biomolecular structure.

[57]  Russell L. Marsden,et al.  Progress of structural genomics initiatives: an analysis of solved target structures. , 2005, Journal of molecular biology.

[58]  A. Villaverde,et al.  Protein quality in bacterial inclusion bodies. , 2006, Trends in biotechnology.

[59]  Amedeo Caflisch,et al.  Computational analysis of the S. cerevisiae proteome reveals the function and cellular localization of the least and most amyloidogenic proteins , 2007, Proteins.