Exploring protein sequence space using knowledge-based potentials.

Knowledge-based potentials can be used to decide whether an amino acid sequence is likely to fold into a prescribed native protein structure. We use this idea to survey the sequence-structure relations in protein space. In particular, we test the following two propositions which were found to be important for efficient evolution: the sequences folding into a particular native fold form extensive neutral networks that percolate through sequence space. The neutral networks of any two native folds approach each other to within a few point mutations. Computer simulations using two very different potential functions, M. Sippl's PROSA pair potential and a neural network based potential, are used to verify these claims.

[1]  J. Maynard Smith Natural Selection and the Concept of a Protein Space , 1970 .

[2]  John Maynard Smith,et al.  Natural Selection and the Concept of a Protein Space , 1970, Nature.

[3]  R Diamond,et al.  Real-space refinement of the structure of hen egg-white lysozyme. , 1977, Journal of molecular biology.

[4]  K E Drexler,et al.  Molecular engineering: An approach to the development of general capabilities for molecular manipulation. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Drexler Ke,et al.  Molecular engineering: An approach to the development of general capabilities for molecular manipulation. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[6]  C. Bugg,et al.  Structure of ubiquitin refined at 1.8 A resolution. , 1987, Journal of molecular biology.

[7]  P Argos,et al.  Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. , 1987, Journal of molecular biology.

[8]  W. Bode,et al.  The 2.0 A X‐ray crystal structure of chicken egg white cystatin and its possible mode of interaction with cysteine proteinases. , 1988, The EMBO journal.

[9]  H. Eklund,et al.  Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution. , 1990, Journal of molecular biology.

[10]  N. D. Clarke,et al.  Identification of protein folds: Matching hydrophobicity patterns of sequence sets with solvent accessibility patterns of known structures , 1990, Proteins.

[11]  G. Casari,et al.  Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[12]  Chris Sander Databases of homology-derived protein structures , 1990 .

[13]  M. Sippl Calculation of conformational ensembles from potentials of mena force , 1990 .

[14]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[15]  W. Bode,et al.  cystatins: protein inhibitors of cysteine proteinases , 2001 .

[16]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[17]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[18]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[19]  M J Sippl,et al.  Structure-derived hydrophobic potential. Hydrophobic potential derived from X-ray structures of globular proteins is able to identify native folds. , 1992, Journal of molecular biology.

[20]  S. Forsén,et al.  Proline cis-trans isomers in calbindin D9k observed by X-ray crystallography. , 1992, Journal of Molecular Biology.

[21]  P. Wolynes,et al.  Protein tertiary structure recognition using optimized Hamiltonians with local interactions. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  C. Chothia Proteins. One thousand families for the molecular biologist. , 1992, Nature.

[23]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[24]  S. Roe,et al.  Atomic resolution (0.83 A) crystal structure of the hydrophobic protein crambin at 130 K. , 1993, Journal of molecular biology.

[25]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[26]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..

[27]  S V Evans,et al.  Refinement of recombinant oncomodulin at 1.30 A resolution. , 1993, Journal of molecular biology.

[28]  Alexey G. Murzin,et al.  New protein folds , 1994 .

[29]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[30]  A. Beyer,et al.  An improved pair potential to recognize native protein folds , 1994, Proteins.

[31]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[32]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[33]  K. Wüthrich,et al.  Determination of the nuclear magnetic resonance structure of the DNA-binding domain of the P22 c2 repressor (1 to 76) in solution and comparison with the DNA-binding domain of the 434 repressor. , 1994, Journal of molecular biology.

[34]  Tal Grossman,et al.  Neural Net Representations of Empirical Protein Potentials , 1995, ISMB.

[35]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[36]  P. Schuster,et al.  How to search for RNA structures. Theoretical concepts in evolutionary biotechnology. , 1995, Journal of biotechnology.

[37]  P. Schuster,et al.  Analysis of RNA sequence structure maps by exhaustive enumeration I. Neutral networks , 1995 .

[38]  A G Murzin,et al.  Structural classification of proteins: new superfamilies. , 1996, Current opinion in structural biology.

[39]  M. Huynen,et al.  Smoothness within ruggedness: the role of neutrality in adaptation. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[40]  P. Schuster,et al.  Analysis of RNA sequence structure maps by exhaustive enumeration II. Structures of neutral networks and shape space covering , 1996 .

[41]  Miguel Ángel Martínez,et al.  Exploring the functional robustness of an enzyme by in vitro evolution. , 1996, The EMBO journal.

[42]  Iosif I. Vaisman,et al.  Delaunay Tessellation of Proteins: Four Body Nearest-Neighbor Propensities of Amino Acid Residues , 1996, J. Comput. Biol..

[43]  Christian V. Forst,et al.  Structural Constraints and Neutrality in RNA , 1996, German Conference on Bioinformatics.

[44]  Christian M. Reidys,et al.  Random Induced Subgraphs of Generalizedn-Cubes , 1997 .

[45]  P. Stadler,et al.  Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. , 1997, Folding & design.

[46]  R A Goldstein,et al.  The foldability landscape of model proteins , 1997, Biopolymers.

[47]  P. Schuster,et al.  Generic properties of combinatory maps: neutral networks of RNA secondary structures. , 1997, Bulletin of mathematical biology.

[48]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[49]  Suganthi Balasubramanian,et al.  Protein alchemy: Changing β-sheet into α-helix , 1997, Nature Structural Biology.

[50]  S. Balasubramanian,et al.  Protein alchemy: changing beta-sheet into alpha-helix. , 1997, Nature structural biology.

[51]  E. Bornberg-Bauer,et al.  How are model protein structures distributed in sequence space? , 1997, Biophysical journal.

[52]  J. Skolnick,et al.  High coordination lattice models of protein structure, dynamics and thermodynamics. , 1997, Acta biochimica Polonica.

[53]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[54]  Michele Vendruscolo,et al.  Neutral evolution of model proteins: diffusion in sequence space and overdispersion. , 1998, Journal of theoretical biology.

[55]  Peter F. Stadler,et al.  An efficient potential for protein sequence design , 1999, German Conference on Bioinformatics.

[56]  Anthony D. Keefe,et al.  Functional proteins from a random-sequence library , 2001, Nature.