What is the minimum number of letters required to fold a protein?

Experimental studies have shown that the full sequence complexity of naturally occurring proteins is not required to generate rapidly folding and functional proteins, i.e. proteins can be designed with fewer than 20 letters. This raises the question of what is the minimum number of amino acid types required to encode complex protein folds? Here, we investigate this issue from three aspects. First, we study the minimum sequence complexity that can reserve the necessary structural information for detection of distantly related homologues. Second, we compare the ability of designing foldable model sequences over a wide range of reduced amino acid alphabets, which find the minimum number of letters that have the similar design ability as 20. Finally, we survey the lower bound of alphabet size of globular proteins in a non-redundant protein database. These different approaches give a remarkably consistent view, that the minimum number of letters required to fold a protein is around ten.

[1]  Mario Medugno,et al.  Physicochemical Optimization in the Genetic Code Origin as the Number of Codified Amino Acids Increases , 1999, Journal of Molecular Evolution.

[2]  Bonnie Berger,et al.  trilogy: Discovery of sequence–structure patterns across diverse proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jun Wang,et al.  A computational approach to simplifying the protein folding alphabet , 1999, Nature Structural Biology.

[4]  D Baker,et al.  Simplified proteins: minimalist solutions to the 'protein folding problem'. , 1998, Current opinion in structural biology.

[5]  J. Wong A co-evolution theory of the genetic code. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[8]  L. H. Bradley,et al.  Protein design by binary patterning of polar and nonpolar amino acids. , 1993, Methods in molecular biology.

[9]  O. Ptitsyn,et al.  Non-functional conserved residues in globins and their possible role as a folding nucleus. , 1999, Journal of molecular biology.

[10]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[11]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[12]  N. D. Clarke,et al.  Sequence 'minimization': exploring the sequence landscape with simplified sequences. , 1995, Current opinion in biotechnology.

[13]  L. Regan,et al.  Characterization of a helical protein designed from first principles. , 1988, Science.

[14]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[15]  L. Mirny,et al.  Protein folding theory: from lattice to all-atom models. , 2001, Annual review of biophysics and biomolecular structure.

[16]  H. Chan Folding alphabets , 1999, Nature Structural Biology.

[17]  Nicolas E. Buchler,et al.  Effect of alphabet size and foldability requirements on protein structure designability , 1999, Proteins.

[18]  J. Oró,et al.  Three stages in the evolution of the genetic code. , 1993, Bio Systems.

[19]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[20]  Robert T. Sauer,et al.  Cooperatively folded proteins in random sequence libraries , 1995, Nature Structural Biology.

[21]  Peter G. Wolynes,et al.  As simple as can be? , 1997, Nature Structural Biology.

[22]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[24]  D. Baker,et al.  Functional rapidly folding proteins from simplified amino acid sequences , 1997, Nature Structural Biology.

[25]  S. Akanuma,et al.  Combinatorial mutagenesis to restrict amino acid usage in an enzyme to a reduced set , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[27]  L. Mirny,et al.  Evolutionary conservation of the folding nucleus. , 2000, Journal of molecular biology.

[28]  E I Shakhnovich,et al.  Evolution-like selection of fast-folding model proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[29]  S. Osawa,et al.  Recent evidence for evolution of the genetic code , 1992, Microbiological reviews.

[30]  R. Levy,et al.  Simplified amino acid alphabets for protein fold recognition and implications for folding. , 2000, Protein engineering.

[31]  M. Volkenstein,et al.  Protein structure and neutral theory of evolution. , 1986, Journal of biomolecular structure & dynamics.

[32]  P. Romero,et al.  Folding minimal sequences: the lower bound for sequence complexity of globular proteins , 1999, FEBS letters.

[33]  Robert M. Stroud,et al.  A designed four helix bundle protein with native-like structure , 1997, Nature Structural Biology.

[34]  Michael H. Hecht,et al.  Protein Design: The Choice of de Novo Sequences* , 1997, The Journal of Biological Chemistry.

[35]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Bonnie Berger,et al.  trilogy: Discovery of sequence–structure patterns across diverse proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.