论文信息 - Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination

Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination

Relationships among amino acids determine stability and function and are also constrained by evolutionary history. We develop a probabilistic hypergraph model of residue relationships that generalizes traditional pairwise contact potentials to account for the statistics of multi-residue interactions. Using this model, we detected non-random associations in protein families and in the protein database. We also use this model in optimizing site-directed recombination experiments to preserve significant interactions and thereby increase the frequency of generating useful recombinants. We formulate the optimization as a sequentially-constrained hypergraph partitioning problem; the quality of recombinant libraries with respect to a set of breakpoints is characterized by the total perturbation to edge weights. We prove this problem to be NP-hard in general, but develop exact and heuristic polynomial-time algorithms for a number of important cases. Application to the beta-lactamase family demonstrates the utility of our algorithms in planning site-directed recombination.

[1] Chris Bailey-Kellogg,et al. Site‐directed combinatorial construction of chimaeric genes: General method for optimizing assembly of gene fragments , 2006, Proteins.

[2] H. Scheraga,et al. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[3] Christopher A. Voigt,et al. Protein building blocks preserved by recombination , 2002, Nature Structural Biology.

[4] Marc Ostermeier,et al. Synthetic gene libraries: in search of the optimal diversity. , 2003, Trends in biotechnology.

[5] Frances H Arnold,et al. Staggered extension process (StEP) in vitro recombination. , 2003, Methods in molecular biology.

[6] W. Stemmer. Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[7] A. Tropsha,et al. Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. , 2001, Journal of molecular biology.

[8] C D Maranas,et al. Creating multiple-crossover DNA libraries independent of sequence identity , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9] Guoli Wang,et al. PISCES: a protein sequence culling server , 2003, Bioinform..

[10] R. Ranganathan,et al. Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[11] Adam Godzik,et al. Fold recognition methods. , 2005, Methods of biochemical analysis.

[12] Marc Ostermeier,et al. A combinatorial approach to hybrid enzymes independent of DNA homology , 1999, Nature Biotechnology.

[13] G. Crippen,et al. Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[14] Frances H Arnold,et al. To whom correspondence should be addressed. , 2022 .

[15] Volker Sieber,et al. Libraries of hybrid proteins from distantly related sequences , 2001, Nature Biotechnology.

[16] C Kooperberg,et al. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[17] Frances H Arnold,et al. Library analysis of SCHEMA‐guided protein recombination , 2003, Protein science : a publication of the Protein Society.

[18] Linda A. Castle,et al. Discovery and Directed Evolution of a Glyphosate Tolerance Gene , 2004, Science.

[19] J. Skolnick,et al. TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20] Marc Ostermeier,et al. Finding Cinderella's slipper—proteins that fit , 1999, Nature Biotechnology.

[21] D. Baker,et al. Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[22] Alexander Tropsha,et al. Development of a four-body statistical pseudo-potential to discriminate native from non-native protein conformations , 2003, Bioinform..

[23] R. Jernigan,et al. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[24] D. Thirumalai,et al. Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes , 2008, Protein science : a publication of the Protein Society.

[25] Costas D Maranas,et al. Design of combinatorial protein libraries of optimal size , 2005, Proteins.

[26] M. Sippl. Calculation of conformational ensembles from potentials of mena force , 1990 .

[27] W. Coco,et al. RACHITT: Gene family shuffling by Random Chimeragenesis on Transient Templates. , 2003, Methods in molecular biology.

[28] C. Sander,et al. Correlated mutations and residue contacts in proteins , 1994, Proteins.

[29] Frances H Arnold,et al. General method for sequence-independent site-directed chimeragenesis. , 2003, Journal of molecular biology.

[30] M. Sippl. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[31] Chris Bailey-Kellogg,et al. Graphical Models of Residue Coupling in Protein Families , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32] Paul E O'Maille,et al. Structure-based combinatorial protein engineering (SCOPE). , 2002, Journal of Molecular Biology.