Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination.

Protein engineering by site-directed recombination seeks to develop proteins with new or improved function, by accumulating multiple mutations from a set of homologous parent proteins. A library of hybrid proteins is created by recombining the parent proteins at specified breakpoint locations; subsequent screening/selection identifies hybrids with desirable functional characteristics. In order to improve the frequency of generating novel hybrids, this paper develops the first approach to explicitly plan for diversity in site-directed recombination, including metrics for characterizing the diversity of a planned hybrid library and efficient algorithms for optimizing experiments accordingly. The goal is to choose breakpoint locations to sample sequence space as uniformly as possible (which we argue maximizes diversity), under the constraints imposed by the recombination process and the given set of parents. A dynamic programming approach selects optimal breakpoint locations in polynomial time. Application of our method to optimizing breakpoints for an example biosynthetic enzyme, purE, demonstrates the significance of diversity optimization and the effectiveness of our algorithms.

[1]  Jeffrey B. Endelman,et al.  Structure-Guided Recombination Creates an Artificial Family of Cytochromes P450 , 2006, PLoS biology.

[2]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[3]  Bruce Randall Donald,et al.  A Novel Ensemble-Based Scoring and Search Algorithm for Protein Redesign and Its Application to Modify the Substrate Specificity of the Gramicidin Synthetase A Phenylalanine Adenylation Enzyme , 2005, J. Comput. Biol..

[4]  Frances Arnold,et al.  Staggered extension process (StEP) in vitro recombination. , 2002, Methods in molecular biology.

[5]  Christopher A. Voigt,et al.  Protein building blocks preserved by recombination , 2002, Nature Structural Biology.

[6]  Costas D Maranas,et al.  Identifying residue–residue clashes in protein hybrids by using a second-order mean-field approach , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Chris Bailey-Kellogg,et al.  Site‐directed combinatorial construction of chimaeric genes: General method for optimizing assembly of gene fragments , 2006, Proteins.

[8]  L. Looger,et al.  Computational design of receptor and sensor proteins with novel functions , 2003, Nature.

[9]  Chris Bailey-Kellogg,et al.  Analysis of sequence–reactivity space for protein–protein interactions , 2004, Proteins.

[10]  Christopher A. Voigt,et al.  Functional evolution and structural conservation in chimeric cytochromes p450: calibrating a structure-guided approach. , 2004, Chemistry & biology.

[11]  Chris Bailey-Kellogg,et al.  Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination , 2006, RECOMB.

[12]  Frances H Arnold,et al.  Library analysis of SCHEMA‐guided protein recombination , 2003, Protein science : a publication of the Protein Society.

[13]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[14]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[15]  G. Georgiou,et al.  Quantitative analysis of the effect of the mutation frequency on the affinity maturation of single chain Fv antibodies. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Frances H Arnold,et al.  To whom correspondence should be addressed. , 2022 .

[17]  Costas D Maranas,et al.  Design of combinatorial protein libraries of optimal size , 2005, Proteins.

[18]  M. Zaccolo,et al.  The effect of high-frequency random mutagenesis on in vitro protein evolution: a study on TEM-1 beta-lactamase. , 1999, Journal of molecular biology.

[19]  S M Firestine,et al.  Reactions catalyzed by 5-aminoimidazole ribonucleotide carboxylases from Escherichia coli and Gallus gallus: a case for divergent catalytic mechanisms. , 1994, Biochemistry.

[20]  W. Coco,et al.  RACHITT: Gene family shuffling by Random Chimeragenesis on Transient Templates. , 2003, Methods in molecular biology.