Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination

Relationships among amino acids determine stability and function and are also constrained by evolutionary history. We develop a probabilistic hypergraph model of residue relationships that generalizes traditional pairwise contact potentials to account for the statistics of multi-residue interactions. Using this model, we detected non-random associations in protein families and in the protein database. We also use this model in optimizing site-directed recombination experiments to preserve significant interactions and thereby increase the frequency of generating useful recombinants. We formulate the optimization as a sequentially-constrained hypergraph partitioning problem; the quality of recombinant libraries with respect to a set of breakpoints is characterized by the total perturbation to edge weights. We prove this problem to be NP-hard in general, but develop exact and heuristic polynomial-time algorithms for a number of important cases. Application to the beta-lactamase family demonstrates the utility of our algorithms in planning site-directed recombination.

[1]  Chris Bailey-Kellogg,et al.  Site‐directed combinatorial construction of chimaeric genes: General method for optimizing assembly of gene fragments , 2006, Proteins.

[2]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[3]  Christopher A. Voigt,et al.  Protein building blocks preserved by recombination , 2002, Nature Structural Biology.

[4]  Marc Ostermeier,et al.  Synthetic gene libraries: in search of the optimal diversity. , 2003, Trends in biotechnology.

[5]  Frances H Arnold,et al.  Staggered extension process (StEP) in vitro recombination. , 2003, Methods in molecular biology.

[6]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[7]  A. Tropsha,et al.  Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. , 2001, Journal of molecular biology.

[8]  C D Maranas,et al.  Creating multiple-crossover DNA libraries independent of sequence identity , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[10]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[11]  Adam Godzik,et al.  Fold recognition methods. , 2005, Methods of biochemical analysis.

[12]  Marc Ostermeier,et al.  A combinatorial approach to hybrid enzymes independent of DNA homology , 1999, Nature Biotechnology.

[13]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[14]  Frances H Arnold,et al.  To whom correspondence should be addressed. , 2022 .

[15]  Volker Sieber,et al.  Libraries of hybrid proteins from distantly related sequences , 2001, Nature Biotechnology.

[16]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[17]  Frances H Arnold,et al.  Library analysis of SCHEMA‐guided protein recombination , 2003, Protein science : a publication of the Protein Society.

[18]  Linda A. Castle,et al.  Discovery and Directed Evolution of a Glyphosate Tolerance Gene , 2004, Science.

[19]  J. Skolnick,et al.  TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Marc Ostermeier,et al.  Finding Cinderella's slipper—proteins that fit , 1999, Nature Biotechnology.

[21]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[22]  Alexander Tropsha,et al.  Development of a four-body statistical pseudo-potential to discriminate native from non-native protein conformations , 2003, Bioinform..

[23]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[24]  D. Thirumalai,et al.  Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[25]  Costas D Maranas,et al.  Design of combinatorial protein libraries of optimal size , 2005, Proteins.

[26]  M. Sippl Calculation of conformational ensembles from potentials of mena force , 1990 .

[27]  W. Coco,et al.  RACHITT: Gene family shuffling by Random Chimeragenesis on Transient Templates. , 2003, Methods in molecular biology.

[28]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[29]  Frances H Arnold,et al.  General method for sequence-independent site-directed chimeragenesis. , 2003, Journal of molecular biology.

[30]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[31]  Chris Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Paul E O'Maille,et al.  Structure-based combinatorial protein engineering (SCOPE). , 2002, Journal of Molecular Biology.