Optimization of Combinatorial Mutagenesis

Protein engineering by combinatorial site-directed mutagenesis evaluates a portion of the sequence space near a target protein, seeking variants with improved properties (e.g., stability, activity, immunogenicity). In order to improve the hit-rate of beneficial variants in such mutagenesis libraries, we develop methods to select optimal positions and corresponding sets of the mutations that will be used, in all combinations, in constructing a library for experimental evaluation. Our approach, OCoM (Optimization of Combinatorial Mutagenesis), encompasses both degenerate oligonucleotides and specified point mutations, and can be directed accordingly by requirements of experimental cost and library size. It evaluates the quality of the resulting library by one- and two-body sequence potentials, averaged over the variants. To ensure that it is not simply recapitulating extant sequences, it balances the quality of a library with an explicit evaluation of the novelty of its members. We show that, despite dealing with a combinatorial set of variants, in our approach the resulting library optimization problem is actually isomorphic to single-variant optimization. By the same token, this means that the two-body sequence potential results in an NP-hard optimization problem. We present an efficient dynamic programming algorithm for the one-body case and a practically-efficient integer programming approach for the general two-body case. We demonstrate the effectiveness of our approach in designing libraries for three different case study proteins targeted by previous combinatorial libraries--a green fluorescent protein, a cytochrome P450, and a beta lactamase. We found that OCoM worked quite efficiently in practice, requiring only 1 hour even for the massive design problem of selecting 18 mutations to generate 10⁷ variants of a 443-residue P450. We demonstrate the general ability of OCoM in enabling the protein engineer to explore and evaluate trade-offs between quality and novelty as well as library construction technique, and identify optimal libraries for experimental evaluation.

[1]  Costas D Maranas,et al.  Design of combinatorial protein libraries of optimal size , 2005, Proteins.

[2]  Costas D Maranas,et al.  Identifying residue–residue clashes in protein hybrids by using a second-order mean-field approach , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Amy C. Anderson,et al.  Supporting Information for “ Computational Structure-Based Redesign of Enzyme Activity ” , 2009 .

[4]  Chris Bailey-Kellogg,et al.  Open Access Methodology Article Optimization Algorithms for Functional Deimmunization of Therapeutic Proteins , 2022 .

[5]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[6]  Marco A Mena,et al.  Automated design of degenerate codon libraries. , 2005, Protein engineering, design & selection : PEDS.

[7]  S E Hufton,et al.  Building novel binding ligands to B7.1 and B7.2 based on human antibody single variable light chain domains. , 2001, Journal of molecular biology.

[8]  S. L. Mayo,et al.  Enzyme-like proteins by computational design , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Manfred T Reetz,et al.  Addressing the Numbers Problem in Directed Evolution , 2008, Chembiochem : a European journal of chemical biology.

[10]  Claes Gustafsson,et al.  Semi-synthetic DNA shuffling of aveC leads to improved industrial scale production of doramectin by Streptomyces avermitilis. , 2005, Metabolic engineering.

[11]  W. P. Russ,et al.  Natural-like function in artificial WW domains , 2005, Nature.

[12]  G. F. Joyce,et al.  Randomization of genes by PCR mutagenesis. , 1992, PCR methods and applications.

[13]  Jeffrey B. Endelman,et al.  Structure-Guided Recombination Creates an Artificial Family of Cytochromes P450 , 2006, PLoS biology.

[14]  Janice M Reichert,et al.  Development trends for therapeutic antibody fragments , 2009, Nature Biotechnology.

[15]  Dan S. Tawfik,et al.  Incorporating Synthetic Oligonucleotides via Gene Reassembly (ISOR): a versatile tool for generating targeted libraries. , 2007, Protein engineering, design & selection : PEDS.

[16]  Chris Bailey-Kellogg,et al.  Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[17]  Wolfgang Aehle,et al.  A β-lactamase with reduced immunogenicity for the targeted delivery of chemotherapeutics using antibody-directed enzyme prodrug therapy , 2005, Molecular Cancer Therapeutics.

[18]  R Y Tsien,et al.  Wavelength mutations and posttranslational autoxidation of green fluorescent protein. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[19]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[20]  Frances H Arnold,et al.  Library analysis of SCHEMA‐guided protein recombination , 2003, Protein science : a publication of the Protein Society.

[21]  Chris Bailey-Kellogg,et al.  Protein Design by Sampling an Undirected Graphical Model of Residue Constraints , 2009, TCBB.

[22]  Chris Bailey-Kellogg,et al.  Algorithms for Joint Optimization of Stability and Diversity in Planning Combinatorial Libraries of Chimeric Proteins , 2008, RECOMB.

[23]  Frances H Arnold,et al.  General method for sequence-independent site-directed chimeragenesis. , 2003, Journal of molecular biology.

[24]  Frances H Arnold,et al.  Consensus protein design without phylogenetic bias. , 2010, Journal of molecular biology.

[25]  Chris Bailey-Kellogg,et al.  Graphical models of protein–protein interaction specificity from correlated mutations and interaction data , 2009, Proteins.

[26]  Chris Bailey-Kellogg,et al.  Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination , 2006, RECOMB.

[27]  George Georgiou,et al.  The evolution of catalytic efficiency and substrate promiscuity in human theta class 1-1 glutathione transferase. , 2006, Journal of molecular biology.

[28]  R. Tsien,et al.  Creating new fluorescent probes for cell biology , 2002, Nature Reviews Molecular Cell Biology.

[29]  T. Ogawa,et al.  Reconstitution of the isobutene-forming reaction catalyzed by cytochrome P450 and P450 reductase from Rhodotorula minuta: decarboxylation with the formation of isobutene. , 1994, Biochemical and biophysical research communications.

[30]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[31]  Manfred T Reetz,et al.  Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes , 2007, Nature Protocols.

[32]  Chris Bailey-Kellogg,et al.  Protein Fragment Swapping: A Method for Asymmetric, Selective Site-Directed Recombination , 2009, RECOMB.

[33]  Chris Bailey-Kellogg,et al.  Optimization of Therapeutic proteins to Delete T-Cell epitopes while Maintaining Beneficial Residue Interactions , 2011, J. Bioinform. Comput. Biol..

[34]  Riaan den Haan,et al.  Engineering Cellulolytic Ability into Bioprocessing Organisms , 2011 .

[35]  Frances H Arnold,et al.  Structure-guided SCHEMA recombination of distantly related beta-lactamases. , 2006, Protein engineering, design & selection : PEDS.

[36]  Frances H. Arnold,et al.  Structure-guided SCHEMA recombination of distantly related β-lactamases , 2006 .

[37]  Stephen J Benkovic,et al.  FamClash: A method for ranking the activity of engineered enzymes , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Mark R. Soboleski,et al.  Green fluorescent protein is a quantitative reporter of gene expression in individual eukaryotic cells , 2005, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[39]  Frances H. Arnold,et al.  Molecular evolution by staggered extension process (StEP) in vitro recombination , 1998, Nature Biotechnology.

[40]  Gregory A Weiss,et al.  Double barrel shotgun scanning of the caveolin-1 scaffolding domain. , 2007, ACS chemical biology.

[41]  Costas D Maranas,et al.  Optimal protein library design using recombination or point mutations based on sequence-based scoring functions. , 2007, Protein engineering, design & selection : PEDS.

[42]  W. Stemmer DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Stephen L Mayo,et al.  Computationally designed libraries of fluorescent proteins evaluated by preservation and diversity of function , 2007, Proceedings of the National Academy of Sciences.

[44]  John C Whitman,et al.  Improving catalytic function by ProSAR-driven enzyme evolution , 2007, Nature Biotechnology.

[45]  Christopher A. Voigt,et al.  Protein building blocks preserved by recombination , 2002, Nature Structural Biology.

[46]  Christopher A. Voigt,et al.  Functional evolution and structural conservation in chimeric cytochromes p450: calibrating a structure-guided approach. , 2004, Chemistry & biology.

[47]  Eric A. Althoff,et al.  De Novo Computational Design of Retro-Aldol Enzymes , 2008, Science.

[48]  C. Bailey-Kellogg,et al.  Graphical Models of Residue Coupling in Protein Families , 2008, TCBB.