Algorithms for optimizing cross-overs in DNA shuffling

BackgroundDNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library.ResultsThis paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. Our CODNS (cross-over optimization for DNA shuffling) approach employs polynomial-time dynamic programming algorithms to select codons for the parental amino acids, allowing for zero or a fixed number of conservative substitutions. We first present efficient algorithms to optimize the local sequence identity or the nearest-neighbor approximation of the change in free energy upon annealing, objectives that were previously optimized by computationally-expensive integer programming methods. We then present efficient algorithms for more powerful objectives that seek to localize and enhance the frequency of recombination by producing "runs" of common nucleotides either overall or according to the sequence diversity of the resulting chimeras. We demonstrate the effectiveness of CODNS in choosing codons and allocating substitutions to promote recombination between parents targeted in earlier studies: two GAR transformylases (41% amino acid sequence identity), two very distantly related DNA polymerases, Pol X and β (15%), and beta-lactamases of varying identity (26-47%).ConclusionsOur methods provide the protein engineer with a new approach to DNA shuffling that supports substantially more diverse parents, is more deterministic, and generates more predictable and more diverse chimeric libraries.

[1]  Fengzhu Sun,et al.  Modeling DNA shuffling , 1998, RECOMB '98.

[2]  W. Stemmer,et al.  Evolution of a cytokine using DNA family shuffling , 1999, Nature Biotechnology.

[3]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[4]  Narendra Maheshri,et al.  Computational and experimental analysis of DNA shuffling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Chris Bailey-Kellogg,et al.  Robotic hierarchical mixing for the production of combinatorial libraries of proteins and small molecules. , 2008, Journal of combinatorial chemistry.

[6]  Chris Bailey-Kellogg,et al.  A divide‐and‐conquer approach to determine the Pareto frontier for optimization of protein engineering experiments , 2012, Proteins.

[7]  Chris Bailey-Kellogg,et al.  Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[8]  Paul E O'Maille,et al.  Structure-based combinatorial protein engineering (SCOPE). , 2002, Journal of Molecular Biology.

[9]  Alan Villalobos,et al.  Gene Designer: a synthetic biology tool for constructing artificial DNA segments , 2006, BMC Bioinformatics.

[10]  Jon E. Ness,et al.  DNA shuffling of subgenomic sequences of subtilisin , 1999, Nature Biotechnology.

[11]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[12]  Toshimichi Ikemura,et al.  Codon usage tabulated from international DNA sequence databases: status for the year 2000 , 2000, Nucleic Acids Res..

[13]  Gerd Folkers,et al.  Directed evolution of thymidine kinase for AZT phosphorylation using DNA family shuffling , 1999, Nature Biotechnology.

[14]  Chris Bailey-Kellogg,et al.  Site‐directed combinatorial construction of chimaeric genes: General method for optimizing assembly of gene fragments , 2006, Proteins.

[15]  Chris Bailey-Kellogg,et al.  Algorithms for Joint Optimization of Stability and Diversity in Planning Combinatorial Libraries of Chimeric Proteins , 2008, RECOMB.

[16]  John M Joern,et al.  DNA shuffling. , 2003, Methods in molecular biology.

[17]  Frances H Arnold,et al.  Library analysis of SCHEMA‐guided protein recombination , 2003, Protein science : a publication of the Protein Society.

[18]  Volker Sieber,et al.  Libraries of hybrid proteins from distantly related sequences , 2001, Nature Biotechnology.

[19]  C D Maranas,et al.  Predicting crossover generation in DNA shuffling , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Toshimichi Ikemura,et al.  Codon usage tabulated from the international DNA sequence databases , 1997, Nucleic Acids Res..

[21]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  W. Stemmer,et al.  DNA shuffling of a family of genes from diverse species accelerates directed evolution , 1998, Nature.

[23]  Crispin Littlehales Willem 'Pim' Stemmer , 2009, Nature Biotechnology.

[24]  Chris Bailey-Kellogg,et al.  Algorithms for optimizing cross-overs in DNA shuffling , 2011, BCB.

[25]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[26]  W. Stemmer DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Marc Ostermeier,et al.  A combinatorial approach to hybrid enzymes independent of DNA homology , 1999, Nature Biotechnology.

[28]  Michael S. Waterman,et al.  A dynamic programming algorithm to find all solutions in a neighborhood of the optimum , 1985 .

[29]  Chris Bailey-Kellogg,et al.  Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination , 2006, RECOMB.

[30]  M. Gouy,et al.  Codon usage in bacteria: correlation with gene expressivity. , 1982, Nucleic acids research.

[31]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Frances H. Arnold,et al.  Directed evolution library creation : methods and protocols , 2003 .

[33]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[34]  Costas D Maranas,et al.  eCodonOpt: a systematic computational framework for optimizing codon usage in directed evolution experiments. , 2002, Nucleic acids research.