An in silico exploration of the neutral network in protein sequence space.

Designating amino-acid sequences that fold into a common main-chain structure as "neutral sequences" for the structure, regardless of their function or stability, we investigated the distribution of neutral sequences in protein sequence space. For four distinct target structures (alpha, beta,alpha/beta and alpha+beta types) with the same chain length of 108, we generated the respective neutral sequences by using the inverse folding technique with a knowledge-based potential function. We assumed that neutral sequences for a protein structure have Z scores higher than or equal to fixed thresholds, where thresholds are defined as the Z score for the corresponding native sequence (case 1) or much greater Z score (case 2). An exploring walk simulation suggested that the neutral sequences mapped into the sequence space were connected with each other through straight neutral paths and formed an inherent neutral network over the sequence space. Through another exploring walk simulation, we investigated contiguous regions between or among the neutral networks for the distinct protein structures and obtained the following results. The closest approach distance between the two neutral networks ranged from 5 to 29 on the Hamming distance scale, showing a linear increase against the threshold values. The sequences located at the "interchange" regions between the two neutral networks have intermediate sequence-profile-scores for both corresponding structures. Introducing a "ball" in the sequence space that contains at least one neutral sequence for each of the four structures, we found that the minimal radius of the ball that is centered at an arbitrary position ranged from 35 to 50, while the minimal radius of the ball that is centered at a certain special position ranged from 20 to 30, in the Hamming distance scale. The relatively small Hamming distances (5-30) may support an evolution mechanism by transferring from a network for a structure to another network for a more beneficial structure via the interchange regions.

[1]  T. Aita,et al.  Statistical formulae of the energy distribution among a globular protein structure ensemble. , 2003, Journal of theoretical biology.

[2]  Michael Levitt,et al.  Roles of mutation and recombination in the evolution of protein thermodynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Yasuhiko Shibanaka,et al.  Surveying a local fitness landscape of a protein with epistatic sites for the study of directed evolution. , 2002, Biopolymers.

[4]  A. Lapedes,et al.  Exploring protein sequence space using knowledge-based potentials. , 2001, Journal of theoretical biology.

[5]  Y Husimi,et al.  A cross-section of the fitness landscape of dihydrofolate reductase. , 2001, Protein engineering.

[6]  K Nishikawa,et al.  Knowledge-based potential defined for a rotamer library to design protein sequences. , 2001, Protein engineering.

[7]  H. Kagamiyama,et al.  Demonstration of the importance and usefulness of manipulating non-active-site residues in protein design. , 2001, Journal of biochemistry.

[8]  Anthony D. Keefe,et al.  Functional proteins from a random-sequence library , 2001, Nature.

[9]  Y. Husimi,et al.  Theory of evolutionary molecular engineering through simultaneous accumulation of advantageous mutations. , 2000, Journal of theoretical biology.

[10]  Taku Suto,et al.  An automated prediction of MHC class I-binding peptides based on positional scanning with peptide libraries , 2000, Immunogenetics.

[11]  D. Bartel,et al.  One sequence, two ribozymes: implications for the emergence of new ribozyme folds. , 2000, Science.

[12]  C. Voigt,et al.  Rational evolutionary design: the theory of in vitro protein evolution. , 2000, Advances in protein chemistry.

[13]  I. Grigoriev,et al.  Detection of protein fold similarity based on correlation of amino acid properties. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  E. Bornberg-Bauer,et al.  Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  K Nishikawa,et al.  Design and synthesis of a globin fold. , 1999, Biochemistry.

[16]  L Serrano,et al.  Exploring the conformational properties of the sequence space between two proteins with different folds: an experimental study. , 1999, Journal of molecular biology.

[17]  P. Schuster,et al.  Chance and necessity in evolution: lessons from RNA , 1998, physics/9811037.

[18]  Peter F. Stadler,et al.  An efficient potential for protein sequence design , 1999, German Conference on Bioinformatics.

[19]  A. Fersht,et al.  Semirational design of active tumor suppressor p53 DNA binding domain with enhanced stability. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  T. Yomo,et al.  Properties of Artificial Proteins with Random Sequences a , 1998, Annals of the New York Academy of Sciences.

[21]  Xu,et al.  Electrical conductivity of olivine, wadsleyite, and ringwoodite under upper-mantle conditions , 1998, Science.

[22]  P. Schuster,et al.  IR-98-039 / April Continuity in Evolution : On the Nature of Transitions , 1998 .

[23]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[24]  F. Arnold,et al.  Directed evolution of enzyme catalysts. , 1997, Trends in biotechnology.

[25]  R A Goldstein,et al.  Evolution of model proteins on a foldability landscape , 1997, Proteins.

[26]  P. Stadler,et al.  Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. , 1997, Folding & design.

[27]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[28]  Suganthi Balasubramanian,et al.  Protein alchemy: Changing β-sheet into α-helix , 1997, Nature Structural Biology.

[29]  K Nishikawa,et al.  Assessment of pseudo-energy potentials by the best-five test: a new use of the three-dimensional profiles of proteins. , 1997, Protein engineering.

[30]  P. Schuster,et al.  Generic properties of combinatory maps: neutral networks of RNA secondary structures. , 1997, Bulletin of mathematical biology.

[31]  D K Agrafiotis,et al.  A new method for analyzing protein sequence relationships based on Sammon maps , 1997, Protein science : a publication of the Protein Society.

[32]  M. Huynen,et al.  Smoothness within ruggedness: the role of neutrality in adaptation. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[33]  M J Sippl,et al.  Progress in fold recognition , 1995, Proteins.

[34]  G Schreiber,et al.  Energetics of protein-protein interactions: analysis of the barnase-barstar interface by single mutations and double mutant cycles. , 1995, Journal of molecular biology.

[35]  Tal Grossman,et al.  Neural Net Representations of Empirical Protein Potentials , 1995, ISMB.

[36]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[37]  J. Wells,et al.  Additivity of mutational effects in proteins. , 1990, Biochemistry.

[38]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[39]  A. Sarai,et al.  Analysis of the sequence-specific interactions between Cro repressor and operator DNA by systematic base substitution experiments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[40]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[41]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.