Directed molecular evolution by machine learning and the influence of nonlinear interactions.

Alternative search strategies for the directed evolution of proteins are presented and compared with each other. In particular, two different machine learning strategies based on partial least-squares regression are developed: the first contains only linear terms that represent a given residue's independent contribution to fitness, the second contains additional nonlinear terms to account for potential epistatic coupling between residues. The nonlinear modeling strategy is further divided into two types, one that contains all possible nonlinear terms and another that makes use of a genetic algorithm to select a subset of important interaction terms. The performance of each modeling type as a function of training set size is analysed. Simulated molecular evolution on a synthetic protein landscape shows the use of machine learning techniques to guide library design can be a powerful addition to library generation methods such as DNA shuffling.

[1]  S. Wold,et al.  INLR, implicit non‐linear latent variable regression , 1997 .

[2]  M. Deem,et al.  A hierarchical approach to protein molecular evolution. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[4]  S. Anderson,et al.  Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  David E. Goldberg,et al.  AllelesLociand the Traveling Salesman Problem , 1985, ICGA.

[6]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[7]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[8]  S. Govindarajan,et al.  Advances in directed protein evolution by recursive genetic recombination: applications to therapeutic proteins. , 2001, Current opinion in biotechnology.

[9]  Claes Gustafsson,et al.  Optimizing the search algorithm for protein engineering by directed evolution. , 2003, Protein engineering.

[10]  B. M. Wise,et al.  Canonical partial least squares and continuum power regression , 2001 .

[11]  David E. Goldberg,et al.  Alleles, loci and the traveling salesman problem , 1985 .

[12]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[13]  H. Kubinyi QSAR and 3D QSAR in drug design Part 1: methodology , 1997 .

[14]  David J. Earl,et al.  Evolvability is a selectable trait. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  F Darius,et al.  "Simulated molecular evolution" or computer-generated artifacts? , 1994, Biophysical journal.

[16]  C Gustafsson,et al.  Exploration of sequence space for protein engineering , 2001, Journal of molecular recognition : JMR.

[17]  H. Kubinyi QSAR and 3D QSAR in drug design Part 2: applications and problems , 1997 .

[18]  G. Schneider,et al.  Peptide design by artificial neural networks and computer-based evolutionary search. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[20]  Jon E. Ness,et al.  DNA shuffling of subgenomic sequences of subtilisin , 1999, Nature Biotechnology.

[21]  Z Daren,et al.  QSPR studies of PCBs by the combination of genetic algorithms and PLS analysis. , 2001, Computers & chemistry.

[22]  G Bucht,et al.  Optimising the signal peptide for glycosyl phosphatidylinositol modification of human acetylcholinesterase using mutational analysis and peptide-quantitative structure-activity relationships. , 1999, Biochimica et biophysica acta.

[23]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[24]  R. Chen,et al.  Enzyme engineering: rational redesign versus directed evolution. , 2001, Trends in biotechnology.

[25]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[26]  E. Martin,et al.  Non-linear projection to latent structures revisited: the quadratic PLS algorithm , 1999 .

[27]  J. Wells,et al.  Additivity of mutational effects in proteins. , 1990, Biochemistry.

[28]  Gisbert Schneider,et al.  Artificial neural networks and simulated molecular evolution are potential tools for sequence-oriented protein design , 1994, Comput. Appl. Biosci..

[29]  G Gäde,et al.  Mathematical modelling of insect neuropeptide potencies. Are quantitatively predictive models possible? , 2000, Insect biochemistry and molecular biology.

[30]  G Schneider,et al.  Peptide design in machina: development of artificial mitochondrial protein precursor cleavage sites by simulated molecular evolution. , 1995, Biophysical journal.

[31]  Gisbert Schneider,et al.  SVM-Based Feature Selection for Characterization of Focused Compound Collections , 2004, J. Chem. Inf. Model..

[32]  Serge Muyldermans,et al.  Kinetic and Affinity Predictions of a Protein-Protein Interaction Using Multivariate Experimental Design* , 2002, The Journal of Biological Chemistry.

[33]  Sung Jin Cho,et al.  Rational Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide Libraries Using Chemical Similarity Probe and the Inverse QSAR Approaches , 1998, J. Chem. Inf. Comput. Sci..

[34]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Application of GA-Based Region Selection to a 3D-QSAR Study of Acetylcholinesterase Inhibitors , 1999, J. Chem. Inf. Comput. Sci..

[35]  A W Edwards,et al.  The genetical theory of natural selection. , 2000, Genetics.

[36]  Jon E. Ness,et al.  Synthetic shuffling expands functional protein diversity by allowing amino acids to recombine independently , 2002, Nature Biotechnology.

[37]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER , 2019, Origins of Order.

[38]  Jorge Haddock,et al.  Evolutionary computing for feature selection and predictive data mining , 2002 .

[39]  T. Auton,et al.  Design of active analogues of a 15-residue peptide using D-optimal design, QSAR and a combinatorial search algorithm. , 2009, The journal of peptide research : official journal of the American Peptide Society.

[40]  M. Levitt,et al.  Simulating protein evolution in sequence and structure space. , 2004, Current opinion in structural biology.

[41]  M. V. Regenmortel,et al.  Are there two distinct research strategies for developing biologically active molecules: rational design and empirical selection? , 2000 .

[42]  S. Srinivasa Rao,et al.  A Simplified NP-Complete MAXSAT Problem , 1998, Inf. Process. Lett..

[43]  Hugo Kubinyi,et al.  Evolutionary variable selection in regression and PLS analyses , 1996 .

[44]  Gisbert Schneider,et al.  Development of simple fitness landscapes for peptides by artificial neural filter systems , 1995, Biological Cybernetics.

[45]  R. A. Fisher,et al.  The Genetical Theory of Natural Selection , 1931 .

[46]  G. Schneider,et al.  Peptide design aided by neural networks: biological activity of artificial signal peptidase I cleavage sites. , 1998, Biochemistry.

[47]  Yasuhiko Shibanaka,et al.  Surveying a local fitness landscape of a protein with epistatic sites for the study of directed evolution. , 2002, Biopolymers.