Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design.

Finding the minimum energy amino acid side-chain conformation is a fundamental problem in both homology modeling and protein design. To address this issue, numerous computational algorithms have been proposed. However, there have been few quantitative comparisons between methods and there is very little general understanding of the types of problems that are appropriate for each algorithm. Here, we study four common search techniques: Monte Carlo (MC) and Monte Carlo plus quench (MCQ); genetic algorithms (GA); self-consistent mean field (SCMF); and dead-end elimination (DEE). Both SCMF and DEE are deterministic, and if DEE converges, it is guaranteed that its solution is the global minimum energy conformation (GMEC). This provides a means to compare the accuracy of SCMF and the stochastic methods. For the side-chain placement calculations, we find that DEE rapidly converges to the GMEC in all the test cases. The other algorithms converge on significantly incorrect solutions; the average fraction of incorrect rotamers for SCMF is 0.12, GA 0.09, and MCQ 0.05. For the protein design calculations, design positions are progressively added to the side-chain placement calculation until the time required for DEE diverges sharply. As the complexity of the problem increases, the accuracy of each method is determined so that the results can be extrapolated into the region where DEE is no longer tractable. We find that both SCMF and MCQ perform reasonably well on core calculations (fraction amino acids incorrect is SCMF 0.07, MCQ 0.04), but fail considerably on the boundary (SCMF 0.28, MCQ 0.32) and surface calculations (SCMF 0.37, MCQ 0.44).

[1]  P. S. Kim,et al.  High-resolution protein design with backbone freedom. , 1998, Science.

[2]  J R Desjarlais,et al.  De novo design of the hydrophobic cores of proteins , 1995, Protein science : a publication of the Protein Society.

[3]  P. Koehl,et al.  Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. , 1994, Journal of molecular biology.

[4]  S L Mayo,et al.  De novo protein design: towards fully automated sequence selection. , 1997, Journal of molecular biology.

[5]  J Moult,et al.  Genetic algorithms for protein structure prediction. , 1996, Current opinion in structural biology.

[6]  C. Sander,et al.  Fast and simple monte carlo algorithm for side chain optimization in proteins: Application to model building by homology , 1992, Proteins.

[7]  Andrej Sali,et al.  Comparative protein structure modeling as an optimization problem , 1997 .

[8]  S. A. Marshall,et al.  Energy functions for protein design. , 1999, Current opinion in structural biology.

[9]  Roland L. Dunbrack,et al.  Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains , 1994, Nature Structural Biology.

[10]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  A Godzik,et al.  In search of the ideal protein sequence. , 1995, Protein engineering.

[13]  D B Gordon,et al.  Branch-and-terminate: a combinatorial optimization algorithm for protein design. , 1999, Structure.

[14]  M. Levitt,et al.  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core , 1991, Nature.

[15]  G. A. Lazar,et al.  De novo design of the hydrophobic core of ubiquitin , 1997, Protein science : a publication of the Protein Society.

[16]  S. L. Mayo,et al.  Computational protein design. , 1999, Structure.

[17]  C. Morgan Full Sequence Design of an Alpha-Helical Protein and Investigation of the Importance of Helix Dipole and Capping Effects in Helical Protein Design , 2000 .

[18]  Stephen L. Mayo,et al.  Design, structure and stability of a hyperthermophilic protein variant , 1998, Nature Structural Biology.

[19]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[20]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[21]  M. Sasai,et al.  Conformation, energy, and folding ability of selected amino acid sequences. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[23]  J R Desjarlais,et al.  Computer search algorithms in protein modification and design. , 1998, Current opinion in structural biology.

[24]  I Lasters,et al.  All in one: a highly detailed rotamer library improves both accuracy and speed in the modelling of sidechains by dead-end elimination. , 1997, Folding & design.

[25]  Marc De Maeyer,et al.  The Dead-End Elimination Theorem: , 2000 .

[26]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[27]  R. Lavery,et al.  A new approach to the rapid determination of protein side chain conformations. , 1991, Journal of biomolecular structure & dynamics.

[28]  D. Benjamin Gordon,et al.  Radical performance enhancements for combinatorial optimization algorithms based on the dead-end elimination theorem , 1998, Journal of Computational Chemistry.

[29]  Maximiliano Vásquez,et al.  An evaluvation of discrete and continuum search techniques for conformational analysis of side chains in proteins , 1995 .

[30]  P. Koehl,et al.  A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling , 1995, Nature Structural Biology.

[31]  Lee Testing homology modeling on mutant proteins: predicting structural and thermodynamic effects in the Ala98-->Val mutants of T4 lysozyme. , 1995, Folding & design.

[32]  S. Subbiah,et al.  Prediction of protein side-chain conformation by packing optimization. , 1991, Journal of molecular biology.

[33]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[34]  F M Richards,et al.  Optimal sequence selection in proteins of known structure by simulated evolution. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[35]  R M Stroud,et al.  Prediction of homologous protein structures based on conformational searches and energetics , 1990, Proteins.

[36]  C. Laughton,et al.  Prediction of protein side-chain conformations from local three-dimensional homology relationships. , 1994, Journal of molecular biology.

[37]  R. Sauer,et al.  Sequence space, folding and protein design. , 1996, Current opinion in structural biology.

[38]  S. L. Mayo,et al.  Conformational splitting: A more powerful criterion for dead‐end elimination , 2000, J. Comput. Chem..

[39]  S. L. Mayo,et al.  Protein design automation , 1996, Protein science : a publication of the Protein Society.

[40]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[41]  S. L. Mayo,et al.  Automated design of the surface positions of protein helices , 1997, Protein science : a publication of the Protein Society.

[42]  David T. Jones,et al.  De novo protein design using pairwise potentials and a genetic algorithm , 1994, Protein science : a publication of the Protein Society.

[43]  C. Lee,et al.  Predicting protein mutant energetics by self-consistent ensemble optimization. , 1994, Journal of molecular biology.

[44]  Marc De Maeyer,et al.  The “Dead-End Elimination” Theorem: A New Approach to the Side-Chain Packing Problem , 1994 .

[45]  S. L. Mayo,et al.  DREIDING: A generic force field for molecular simulations , 1990 .

[46]  S. L. Mayo,et al.  Probing the role of packing specificity in protein design. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[47]  S L Mayo,et al.  Pairwise calculation of protein solvent-accessible surface areas. , 1998, Folding & design.

[48]  J. Mendes,et al.  Improvement of side-chain modeling in proteins with the self-consistent mean field theory method based on an analysis of the factors influencing prediction. , 1999, Biopolymers.

[49]  P Koehl,et al.  Mean-field minimization methods for biological macromolecules. , 1996, Current opinion in structural biology.

[50]  S L Mayo,et al.  Coupling backbone flexibility and amino acid sequence selection in protein design , 1997, Protein science : a publication of the Protein Society.

[51]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.