Computational Design of Ligand Binding Proteins

Protein–ligand binding site prediction methods aim to predict, from amino acid sequence, protein–ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein–ligand interactions has become extremely important to help determine a protein’s functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein–ligand interactions must be utilized to aid in functional elucidation. Here, we briefl y discuss protein function prediction, prediction of protein–ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the fi eld. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein–ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.

[1]  Colin A. Smith,et al.  Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. , 2008, Journal of molecular biology.

[2]  Wei Wang,et al.  Designing gene libraries from protein profiles for combinatorial protein experiments. , 2002, Nucleic acids research.

[3]  Anna V. Hine,et al.  ProxiMAX randomization: a new technology for non-degenerate saturation mutagenesis of contiguous codons , 2013, Biochemical Society transactions.

[4]  Frances H. Arnold,et al.  Computational method to reduce the search space for directed protein evolution , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Peer Bork,et al.  SMART: recent updates, new developments and status in 2015 , 2014, Nucleic Acids Res..

[6]  F. Ducastelle,et al.  Generalized cluster description of multicomponent systems , 1984 .

[7]  Manfred T Reetz,et al.  Reducing codon redundancy and screening effort of combinatorial protein libraries created by saturation mutagenesis. , 2013, ACS synthetic biology.

[8]  Manfred K. Warmuth,et al.  Engineering proteinase K using machine learning and synthetic genes , 2007, BMC biotechnology.

[9]  Frances H Arnold,et al.  SCHEMA-guided protein recombination. , 2004, Methods in enzymology.

[10]  Michael A. Saunders,et al.  LSMR: An Iterative Algorithm for Sparse Least-Squares Problems , 2010, SIAM J. Sci. Comput..

[11]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[12]  D. Segev,et al.  Optimal codon randomization via mathematical programming. , 2013, Journal of theoretical biology.

[13]  Liam J. McGuffin,et al.  Intrinsic disorder prediction from the analysis of multiple protein fold recognition models , 2008, Bioinform..

[14]  Christopher D. Snow,et al.  SHARPEN—Systematic Hierarchical Algorithms for Rotamers and Proteins on an Extended Network , 2009, J. Comput. Chem..

[15]  Margaret E. Johnson,et al.  Current status of the AMOEBA polarizable force field. , 2010, The journal of physical chemistry. B.

[16]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[17]  Tanja Kortemme,et al.  Flexible backbone sampling methods to model and design protein alternative conformations. , 2013, Methods in enzymology.

[18]  Jenny J. Yang,et al.  Calciomics: integrative studies of Ca2+-binding proteins and their interactomes in biological systems. , 2013, Metallomics : integrated biometal science.

[19]  Chris Bailey-Kellogg,et al.  Optimization of Combinatorial Mutagenesis , 2011, RECOMB.

[20]  Stephen L Mayo,et al.  Computationally designed libraries of fluorescent proteins evaluated by preservation and diversity of function , 2007, Proceedings of the National Academy of Sciences.

[21]  E. Coutsias,et al.  Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[22]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[23]  D. Baker,et al.  RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design , 2011, PloS one.

[24]  Costas D Maranas,et al.  Optimal protein library design using recombination or point mutations based on sequence-based scoring functions. , 2007, Protein engineering, design & selection : PEDS.

[25]  Jeffrey B. Endelman,et al.  Structure-Guided Recombination Creates an Artificial Family of Cytochromes P450 , 2006, PLoS biology.

[26]  Zukang Feng,et al.  Ligand Depot: a data warehouse for ligands bound to macromolecules , 2004, Bioinform..

[27]  Frances H Arnold,et al.  A family of thermostable fungal cellulases created by structure-guided recombination , 2009, Proceedings of the National Academy of Sciences.

[28]  Sereina Riniker,et al.  Scents and sense: In silico perspectives on olfactory receptors , 2014, J. Comput. Chem..

[29]  Adrian A Canutescu,et al.  A graph‐theory algorithm for rapid protein side‐chain prediction , 2003, Protein science : a publication of the Protein Society.

[30]  Fei Zhou,et al.  Ultra-Fast Evaluation of Protein Energies Directly from Sequence , 2006, PLoS Comput. Biol..

[31]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[32]  Gevorg Grigoryan,et al.  Design of protein-interaction specificity affords selective bZIP-binding peptides , 2009, Nature.

[33]  Xiong Wang,et al.  Construction of "small-intelligent" focused mutagenesis libraries using well-designed combinatorial degenerate primers. , 2012, BioTechniques.

[34]  Xiong Wang,et al.  MDC-Analyzer: a novel degenerate primer design tool for the construction of intelligent mutagenesis libraries with contiguous sites. , 2014, BioTechniques.

[35]  Rhiju Das,et al.  Atomic-Accuracy Prediction of Protein Loop Structures through an RNA-Inspired Ansatz , 2012, PloS one.

[36]  Anna V Hine,et al.  Removing the redundancy from randomised gene libraries. , 2003, Journal of molecular biology.

[37]  Liam J. McGuffin,et al.  The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction , 2011, Nucleic Acids Res..

[38]  Gevorg Grigoryan,et al.  Identifying and reducing error in cluster‐expansion approximations of protein energies , 2010, J. Comput. Chem..

[39]  Christopher D. Snow,et al.  Polarizable protein packing , 2011, J. Comput. Chem..

[40]  Liam J McGuffin,et al.  Proteogenomics and in silico structural and functional annotation of the barley powdery mildew Blumeria graminis f. sp. hordei. , 2011, Methods.

[41]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[42]  Chris Bailey-Kellogg,et al.  Structure‐based design of combinatorial mutagenesis libraries , 2015, Protein science : a publication of the Protein Society.

[43]  Lucas B Johnson,et al.  Methods for library-scale computational protein design. , 2014, Methods in molecular biology.

[44]  B. Kuhlman,et al.  SwiftLib: rapid degenerate-codon-library optimization through dynamic programming , 2014, Nucleic acids research.

[45]  Gevorg Grigoryan,et al.  Cluster expansion models for flexible‐backbone protein energetics , 2009, J. Comput. Chem..

[46]  Philip A. Romero,et al.  Efficient screening of fungal cellobiohydrolase class I enzymes for thermostabilizing sequence blocks by SCHEMA structure-guided recombination. , 2010, Protein engineering, design & selection : PEDS.

[47]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[48]  Frances H Arnold,et al.  SCHEMA Recombination of a Fungal Cellulase Uncovers a Single Mutation That Contributes Markedly to Stability* , 2009, The Journal of Biological Chemistry.

[49]  Liam J. McGuffin,et al.  IntFOLD: an integrated server for modelling protein structures and functions from amino acid sequences , 2015, Nucleic Acids Res..

[50]  Marco A Mena,et al.  Automated design of degenerate codon libraries. , 2005, Protein engineering, design & selection : PEDS.

[51]  Frances H Arnold,et al.  A diverse set of family 48 bacterial glycoside hydrolase cellulases created by structure‐guided recombination , 2012, The FEBS journal.

[52]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[53]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[54]  Alex Nisthal,et al.  Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles , 2010, Proceedings of the National Academy of Sciences.

[55]  F. Arnold,et al.  A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments , 2007, Nature Biotechnology.

[56]  Wayne M Patrick,et al.  User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries. , 2003, Protein engineering.

[57]  Daniel B. Roche,et al.  Assessing the quality of modelled 3D protein structures using the ModFOLD server. , 2014, Methods in molecular biology.

[58]  Stephen L. Mayo,et al.  Dramatic performance enhancements for the FASTER optimization algorithm , 2006, J. Comput. Chem..

[59]  Liam J McGuffin,et al.  Structure and evolution of barley powdery mildew effector candidates , 2012, BMC Genomics.

[60]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[61]  Gevorg Grigoryan,et al.  Coarse-graining protein energetics in sequence variables. , 2005, Physical review letters.

[62]  Frances H Arnold,et al.  To whom correspondence should be addressed. , 2022 .