The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site.

A method for the rational design of locally encoded amino acid sequence features using artificial neural networks and a technique for simulating molecular evolution has been developed. De novo in machine design of Escherichia coli leader peptidase (SP1) cleavage sites serves as an example application. A modular neural network system that employs sequence descriptions in terms of physicochemical properties has been trained on the recognition of characteristic cleavage site features. It is used for sequence qualification in the design cycle, representing the sequence fitness function. Starting from a random sequence several cleavage site sequences were generated by a simulated molecular evolution technique. It is based on a simple genetic algorithm that takes the quality values calculated by the artificial neural network as a heuristic for inductive sequence optimization. Simulated in vivo mutation and selection allows the identification of predominant sequence positions in Escherichia coli signal peptide cleavage site regions (positions -2 and -6). Various amino acid distance maps are used to define metrics for the step size of mutations. Position-specific mutability values indicate sequence positions exposed to high or low selection pressure in the simulations. The use of several distance maps leads to different courses of optimization and to various idealized sequences. It is concluded that amino acid distances are context dependent. Furthermore, a method for identification of local optima during sequence optimization is presented.

[1]  A. Zamyatnin,et al.  Protein volume in solution. , 1972, Progress in biophysics and molecular biology.

[2]  G. von Heijne,et al.  The cytoplasmic domain of Escherichia coli leader peptidase is a "translocation poison" sequence. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[4]  D C Richardson,et al.  Looking at proteins: representations, folding, packing, and design. Biophysical Society National Lecture, 1992. , 1992, Biophysical journal.

[5]  P. Wrede,et al.  Signal analysis of protein targeting sequences , 1993 .

[6]  M J Sternberg,et al.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. , 1992, Biochemistry.

[7]  D Perlman,et al.  A putative signal peptidase recognition site and sequence in eukaryotic and prokaryotic signal peptides. , 1983, Journal of molecular biology.

[8]  G A Laforet,et al.  Functional limits of conformation, hydrophobicity, and steric constraints in prokaryotic signal peptide cleavage regions. Wild type transport by a simple polymeric signal sequence. , 1991, The Journal of biological chemistry.

[9]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. D. Jones,et al.  Amino acid properties and side-chain orientation in proteins: a cross correlation appraoch. , 1975, Journal of theoretical biology.

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  M Karplus,et al.  Neural networks for protein structure prediction. , 1991, Methods in enzymology.

[13]  P. Argos,et al.  Potential of genetic algorithms in protein folding and protein engineering simulations. , 1992, Protein engineering.

[14]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[15]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[16]  J. Risler,et al.  Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. , 1988, Journal of molecular biology.

[17]  G von Heijne,et al.  Patterns of amino acids near signal-sequence cleavage sites. , 1983, European journal of biochemistry.

[18]  G Schneider,et al.  Analysis of cleavage-site patterns in protein precursor sequences with a perceptron-type neural network. , 1993, Biochemical and biophysical research communications.

[19]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[20]  M. Eigen,et al.  Statistical geometry in sequence space: a method of quantitative comparative sequence analysis. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Janet M. Thornton,et al.  Lessons from analyzing protein structures , 1992 .

[22]  Rapid evolution of peptide and protein binding properties in vitro. , 1992, Current opinion in biotechnology.

[23]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .