Side chain placement using estimation of distribution algorithms

OBJECTIVE This paper presents an algorithm for the solution of the side chain placement problem. METHODS AND MATERIALS The algorithm combines the application of the Goldstein elimination criterion with the univariate marginal distribution algorithm (UMDA), which stochastically searches the space of possible solutions. The suitability of the algorithm to address the problem is investigated using a set of 425 proteins. RESULTS For a number of difficult instances where inference algorithms do not converge, it has been shown that UMDA is able to find better structures. CONCLUSIONS The results obtained show that the algorithm can achieve better structures than those obtained with other state-of-the-art methods like inference-based techniques. Additionally, a theoretical and empirical analysis of the computational cost of the algorithm introduced has been presented.

[1]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  R. Friesner,et al.  Computer modeling of protein folding: conformational and energetic analysis of reduced and detailed protein models. , 1995, Journal of molecular biology.

[6]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[7]  J R Desjarlais,et al.  Computational protein design. , 2001, Current opinion in chemical biology.

[8]  Martin Pelikan,et al.  Bayesian Optimization Algorithm , 2005 .

[9]  Pedro Larrañaga,et al.  Unsupervised Learning Of Bayesian Networks Via Estimation Of Distribution Algorithms: An Application To Gene Expression Data Clustering , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[11]  M. Vásquez,et al.  Modeling side-chain conformation. , 1996, Current opinion in structural biology.

[12]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[13]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[14]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[15]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[16]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[17]  Yvan Saeys,et al.  Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction , 2003, ECCB.

[18]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[19]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[20]  Christopher A. Voigt,et al.  Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. , 2000, Journal of molecular biology.

[21]  G. A. Lazar,et al.  De novo design of the hydrophobic core of ubiquitin , 1997, Protein science : a publication of the Protein Society.

[22]  Patrice Koehl,et al.  Building protein lattice models using self-consistent mean field theory , 1998 .

[23]  H. Scheraga,et al.  Global optimization of clusters, crystals, and biomolecules. , 1999, Science.

[24]  J. Skolnick,et al.  Reduced models of proteins and their applications , 2004 .

[25]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[26]  J. Jung,et al.  Protein structure prediction. , 2001, Current opinion in chemical biology.

[27]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[28]  N. Grishin,et al.  Side‐chain modeling with an optimized scoring function , 2002, Protein science : a publication of the Protein Society.

[29]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[30]  Hao Chen,et al.  Beyond the rotamer library: Genetic algorithm combined with the disturbing mutation process for upbuilding protein side‐chains , 2003, Proteins.

[31]  S J Wodak,et al.  Automatic protein design with all atom force-fields by exact and heuristic optimization. , 2000, Journal of molecular biology.

[32]  Amiram Goldblum,et al.  A stochastic algorithm for global optimization and for best populations: A test case of side chains in proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[34]  T. Dandekar,et al.  Computational methods for the prediction of protein folds , 1997 .

[35]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[36]  T M Handel,et al.  Review: protein design--where we were, where we are, where we're going. , 2001, Journal of structural biology.

[37]  R. Lavery,et al.  A new approach to the rapid determination of protein side chain conformations. , 1991, Journal of biomolecular structure & dynamics.

[38]  Pedro Larrañaga,et al.  Mathematical modelling of UMDAc algorithm with tournament selection. Behaviour on linear and quadratic functions , 2002, Int. J. Approx. Reason..

[39]  Yoshinori Uesaka,et al.  Foundations of real-world intelligence , 2001 .

[40]  Thilo Mahnig,et al.  Evolutionary Synthesis of Bayesian Networks for Optimization , 1999 .

[41]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[42]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[43]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[44]  H. Farid,et al.  Prediction and evaluation of side‐chain conformations for protein backbone structures , 1996, Proteins.

[45]  Heinz Mühlenbein,et al.  The Estimation of Distributions and the Minimum Relative Entropy Principle , 2005, Evol. Comput..

[46]  Marc De Maeyer,et al.  The Dead-End Elimination Theorem: , 2000 .