Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins

Protein folding often competes with intermolecular aggregation, which in most cases irreversibly impairs protein function, as exemplified by the formation of inclusion bodies. Although it has been empirically determined that some proteins tend to aggregate, the relationship between the protein aggregation propensities and the primary sequences remains poorly understood. Here, we individually synthesized the entire ensemble of Escherichia coli proteins by using an in vitro reconstituted translation system and analyzed the aggregation propensities. Because the reconstituted translation system is chaperone-free, we could evaluate the inherent aggregation propensities of thousands of proteins in a translation-coupled manner. A histogram of the solubilities, based on data from 3,173 translated proteins, revealed a clear bimodal distribution, indicating that the aggregation propensities are not evenly distributed across a continuum. Instead, the proteins can be categorized into 2 groups, soluble and aggregation-prone proteins. The aggregation propensity is most prominently correlated with the structural classification of proteins, implying that the prediction of aggregation propensity requires structural information about the protein.

[1]  Monica Riley,et al.  Escherichia coli K-12: a cooperatively developed annotation snapshot—2005 , 2006, Nucleic acids research.

[2]  J. Weissman,et al.  A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel prions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Lindquist,et al.  Hsp90 as a capacitor for morphological evolution , 1998, Nature.

[4]  Takuya Ueda,et al.  Cell-free translation reconstituted with purified components , 2001, Nature Biotechnology.

[5]  C. Dobson,et al.  Protein misfolding, functional amyloid, and human disease. , 2006, Annual review of biochemistry.

[6]  H. Taguchi,et al.  Co-translational Involvement of the Chaperonin GroEL in the Folding of Newly Translated Polypeptides* , 2005, Journal of Biological Chemistry.

[7]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[8]  Silvio C. E. Tosatto,et al.  The PASTA server for protein aggregation prediction. , 2007, Protein engineering, design & selection : PEDS.

[9]  Takeshi Kawabata,et al.  GTOP: a database of protein structures predicted from genome sequences , 2002, Nucleic Acids Res..

[10]  William Stafford Noble,et al.  Support vector machine , 2013 .

[11]  S. Lindquist,et al.  Hsp90 as a capacitor of phenotypic variation , 2002, Nature.

[12]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[13]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[14]  F. Hartl,et al.  Molecular Chaperones in the Cytosol: from Nascent Chain to Folded Protein , 2002, Science.

[15]  D. J. Naylor,et al.  Proteome-wide Analysis of Chaperonin-Dependent Protein Folding in Escherichia coli , 2005, Cell.

[16]  J. King,et al.  Thermolabile folding intermediates: inclusion body precursors and chaperonin substrates , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[17]  N. Ono,et al.  Comprehensive Analysis of the Effects of Escherichia coli ORFs on Protein Translation Reaction*S , 2008, Molecular & Cellular Proteomics.

[18]  Ying Xu,et al.  Mapping abeta amyloid fibril secondary structure using scanning proline mutagenesis. , 2004, Journal of molecular biology.

[19]  Takuya Ueda,et al.  Protein synthesis by pure translation systems. , 2005, Methods.

[20]  H. Taguchi,et al.  Chaperone-assisted folding of a single-chain antibody in a reconstituted translation system. , 2004, Biochemical and biophysical research communications.

[21]  Christopher M. Dobson,et al.  Kinetic partitioning of protein folding and aggregation , 2002, Nature Structural Biology.

[22]  D. Eisenberg,et al.  Bacterial Inclusion Bodies Contain Amyloid-Like Structure , 2008, PLoS biology.

[23]  R. Jaenicke Folding and association versus misfolding and aggregation of proteins. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[24]  Antonio Sillero,et al.  Isoelectric point determination of proteins and other macromolecules: Oscillating method , 2006, Comput. Biol. Medicine.

[25]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[26]  C. Dobson,et al.  Rationalization of the effects of mutations on peptide andprotein aggregation rates , 2003, Nature.

[27]  Amedeo Caflisch,et al.  Prediction of aggregation rate and aggregation‐prone segments in polypeptide sequences , 2005, Protein science : a publication of the Protein Society.

[28]  H. Mori,et al.  Complete set of ORF clones of Escherichia coli ASKA library (a complete set of E. coli K-12 ORF archive): unique resources for biological research. , 2006, DNA research : an international journal for rapid publication of reports on genes and genomes.

[29]  Salvador Ventura,et al.  Mutagenesis of the central hydrophobic cluster in Abeta42 Alzheimer's peptide. Side-chain properties correlate with aggregation propensities. , 2006, The FEBS journal.

[30]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[31]  J. Kato,et al.  Construction of consecutive deletions of the Escherichia coli chromosome , 2007, Molecular systems biology.

[32]  C. Dobson Protein folding and misfolding , 2003, Nature.

[33]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[34]  Francesc X. Avilés,et al.  AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides , 2007, BMC Bioinform..

[35]  Liam J. McGuffin,et al.  Protein structure prediction servers at University College London , 2005, Nucleic Acids Res..

[36]  D. Selkoe,et al.  Soluble protein oligomers in neurodegeneration: lessons from the Alzheimer's amyloid β-peptide , 2007, Nature Reviews Molecular Cell Biology.

[37]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[38]  H. Taguchi,et al.  Co-translational Binding of GroEL to Nascent Polypeptides Is Followed by Post-translational Encapsulation by GroES to Mediate Protein Folding* , 2006, Journal of Biological Chemistry.

[39]  Elizabeth H C Bromley,et al.  Synthetic biology through biomolecular design and engineering. , 2008, Current opinion in structural biology.

[40]  A. Villaverde,et al.  Protein quality in bacterial inclusion bodies. , 2006, Trends in biotechnology.

[41]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..