A cluster-DEE-based strategy to empower protein design

Abstract The Medical and Pharmaceutical industries have shown high interest in the precise engineering of protein hormones and enzymes that perform existing functions under a wide range of conditions. Proteins are responsible for the execution of different functions in the cell: catalysis in chemical reactions, transport and storage, regulation and recognition control. Computational Protein Design ( CPD ) investigates the relationship between 3-D structures of proteins and amino acid sequences and looks for all sequences that will fold into such 3-D structure. Many computational methods and algorithms have been proposed over the last years, but the problem still remains a challenge for Mathematicians, Computer Scientists, Bioinformaticians and Structural Biologists. In this article we present a new method for the protein design problem. Clustering techniques and a Dead-End-Elimination algorithm are combined with a SAT problem representation of the CPD problem in order to design the amino acid sequences. The obtained results illustrate the accuracy of the proposed method, suggesting that integrated Artificial Intelligence techniques are useful tools to solve such an intricate problem.

[1]  R. Klevit,et al.  Structures of DNA‐binding mutant zinc finger domains: Implications for DNA binding , 1993, Protein science : a publication of the Protein Society.

[2]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[3]  D. Benjamin Gordon,et al.  Radical performance enhancements for combinatorial optimization algorithms based on the dead‐end elimination theorem , 1998 .

[4]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[5]  P. Argos,et al.  Rotamers: to be or not to be? An analysis of amino acid side-chain conformations in globular proteins. , 1993, Journal of molecular biology.

[6]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[7]  Arnold T. Hagler,et al.  Urey-Bradley force field, valence force field, and ab initio study of intramolecular forces in tri-tert-butylmethane and isobutane , 1979 .

[8]  M. Overduin,et al.  Insights into specific DNA recognition during the assembly of a viral genome packaging machine. , 2002, Molecular cell.

[9]  S. L. Mayo,et al.  Conformational splitting: A more powerful criterion for dead‐end elimination , 2000, J. Comput. Chem..

[10]  Hector J. Levesque,et al.  A New Method for Solving Hard Satisfiability Problems , 1992, AAAI.

[11]  Robert Wille,et al.  Using Higher Levels of Abstraction for Solving Optimization Problems by Boolean Satisfiability , 2008, 2008 IEEE Computer Society Annual Symposium on VLSI.

[12]  B. Gopal,et al.  Structural and functional characterization of Staphylococcus aureus dihydrodipicolinate synthase , 2008, FEBS letters.

[13]  Haruki Nakamura,et al.  Protein design on computers. Five new proteins: Shpilka, grendel, fingerclasp, leather, and aida , 1992, Proteins.

[14]  Bruce Tidor,et al.  Progress in computational protein design. , 2007, Current opinion in biotechnology.

[15]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .

[16]  Brian Everitt,et al.  Cluster analysis , 1974 .

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[18]  Geoffrey K. Hom,et al.  A search algorithm for fixed‐composition protein design , 2006, J. Comput. Chem..

[19]  K. Sharp,et al.  Potential energy functions for protein design. , 2007, Current opinion in structural biology.

[20]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[21]  D. Osguthorpe Ab initio protein folding. , 2000, Current opinion in structural biology.

[22]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[23]  A. Lehninger Principles of Biochemistry , 1984 .

[24]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[25]  Alexander D. MacKerell,et al.  All-atom empirical potential for molecular modeling and dynamics studies of proteins. , 1998, The journal of physical chemistry. B.

[26]  Bart Selman,et al.  Satisfiability Solvers , 2008, Handbook of Knowledge Representation.

[27]  Russ B Altman,et al.  Defining bioinformatics and structural bioinformatics. , 2003, Methods of biochemical analysis.

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  Stephen L. Mayo,et al.  Dramatic performance enhancements for the FASTER optimization algorithm , 2006, J. Comput. Chem..

[30]  M. Hao,et al.  Designing potential energy functions for protein folding. , 1999, Current opinion in structural biology.

[31]  Matthew L. Ginsberg,et al.  Generalizing Boolean Satisfiability I: Background and Survey of Existing Work , 2011, J. Artif. Intell. Res..

[32]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[33]  Christopher A. Voigt,et al.  Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. , 2000, Journal of molecular biology.

[34]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[35]  Ralph Zahn,et al.  The octapeptide repeats in mammalian prion protein constitute a pH-dependent folding and aggregation site. , 2003, Journal of molecular biology.

[36]  Julian Tirado-Rives,et al.  Potential energy functions for atomic-level simulations of water and organic and biomolecular systems. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[38]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[39]  T. Creighton,et al.  Protein Folding , 1992 .

[40]  J. Thornton,et al.  Stereochemical quality of protein structure coordinates , 1992, Proteins.

[41]  D. Velmurugan,et al.  Side-chain conformation angles of amino acids: effect of temperature factor cut-off. , 2003, Journal of structural biology.

[42]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[43]  Bruce Randall Donald,et al.  Dead-End Elimination with Backbone Flexibility , 2007, ISMB/ECCB.

[44]  A. Warshel,et al.  Consistent Force Field for Calculations of Conformations, Vibrational Spectra, and Enthalpies of Cycloalkane and n‐Alkane Molecules , 1968 .

[45]  Tanja Kortemme,et al.  SAT-based protein design , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[46]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[47]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[48]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[49]  Bart Selman,et al.  Computational science: Can get satisfaction , 2005, Nature.

[50]  H. Lodish Molecular Cell Biology , 1986 .

[51]  Christodoulos A. Floudas,et al.  Advances in protein structure prediction and de novo protein design : A review , 2006 .

[52]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[53]  Arthur M. Lesk,et al.  Introduction to bioinformatics , 2002 .

[54]  T M Handel,et al.  Review: protein design--where we were, where we are, where we're going. , 2001, Journal of structural biology.

[55]  Wei Xie,et al.  Residue-rotamer-reduction algorithm for the protein side-chain conformation problem , 2006, Bioinform..

[56]  Christopher M. MacDermaid,et al.  Theoretical and computational protein design. , 2011, Annual review of physical chemistry.

[57]  Jeffery G. Saven,et al.  Computational methods for protein design and protein sequence variability: Biased Monte Carlo and replica exchange , 2005 .

[58]  Anna Tramontano,et al.  Protein structure prediction , 2013 .

[59]  Jan A. Spriet Side-Chain Structure Prediction Based on Dead-End Elimination: Single Split DEE-criterion Implementation and Elimination Power , 2003, WABI.

[60]  Pu Tian,et al.  Computational protein design, from single domain soluble proteins to membrane proteins. , 2010, Chemical Society reviews.

[61]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.

[62]  Heather T. McFarlane,et al.  Atomic structures of amyloid cross-β spines reveal varied steric zippers , 2007, Nature.

[63]  S. A. Marshall,et al.  Energy functions for protein design. , 1999, Current opinion in structural biology.

[64]  Joao Marques-Silva Practical applications of Boolean Satisfiability , 2008, 2008 9th International Workshop on Discrete Event Systems.

[65]  C. Branden,et al.  Introduction to protein structure , 1991 .

[66]  M. Starovasnik,et al.  Structural mimicry of a native protein by a minimized binding domain. , 1997, Proceedings of the National Academy of Sciences of the United States of America.