Efficient design of meganucleases using a machine learning approach

BackgroundMeganucleases are important tools for genome engineering, providing an efficient way to generate DNA double-strand breaks at specific loci of interest. Numerous experimental efforts, ranging from in vivo selection to in silico modeling, have been made to re-engineer meganucleases to target relevant DNA sequences.ResultsHere we present a novel in silico method for designing custom meganucleases that is based on the use of a machine learning approach. We compared it with existing in silico physical models and high-throughput experimental screening. The machine learning model was used to successfully predict active meganucleases for 53 new DNA targets.ConclusionsThis new method shows competitive performance compared with state-of-the-art in silico physical models, with up to a fourfold increase in terms of the design success rate. Compared to experimental high-throughput screening methods, it reduces the number of screening experiments needed by a factor of more than 100 without affecting final performance.

[1]  E. Rebar,et al.  Genome editing with engineered zinc finger nucleases , 2010, Nature Reviews Genetics.

[2]  K. Shinozaki,et al.  Engineering drought tolerance in plants: discovering and tailoring genes to unlock the future. , 2006, Current opinion in biotechnology.

[3]  BMC Bioinformatics , 2005 .

[4]  S. Stella,et al.  Non-specific protein–DNA interactions control I-CreI target binding and cleavage , 2012, Nucleic acids research.

[5]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[6]  S. Grizot,et al.  The I-CreI meganuclease and its engineered derivatives: applications from cell modification to gene therapy. , 2011, Protein engineering, design & selection : PEDS.

[7]  P. Duchateau,et al.  Engineering of large numbers of highly specific homing endonucleases that induce recombination on novel DNA targets. , 2006, Journal of molecular biology.

[8]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[9]  George H. Silva,et al.  Precision genome surgery , 2007, Nature Biotechnology.

[10]  D. Baker,et al.  Computational redesign of endonuclease DNA binding and cleavage specificity , 2006, Nature.

[11]  Jean-Philippe Vert,et al.  SIRENE: supervised inference of regulatory networks , 2008, ECCB.

[12]  J. Friedman Stochastic gradient boosting , 2002 .

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  P. Duchateau,et al.  Meganucleases and Other Tools for Targeted Genome Engineering: Perspectives and Challenges for Gene Therapy , 2011, Current gene therapy.

[15]  Barry L. Stoddard,et al.  High-resolution profiling of homing endonuclease binding and catalytic specificity using yeast surface display , 2009, Nucleic acids research.

[16]  David Baker,et al.  Engineering domain fusion chimeras from I-OnuI family LAGLIDADG homing endonucleases , 2012, Nucleic acids research.

[17]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[18]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[19]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[20]  Feng Zhang,et al.  CRISPR-assisted editing of bacterial genomes , 2013, Nature Biotechnology.

[21]  Ronnie J Winfrey,et al.  High frequency modification of plant genes using engineered zinc finger nucleases , 2009, Nature.

[22]  Umut Y. Ulge,et al.  Comprehensive computational design of mCreI homing endonuclease cleavage specificity for genome engineering , 2011, Nucleic acids research.

[23]  Rafael J. Yáñez-Muñoz,et al.  Chromosomal context and epigenetic mechanisms control the efficacy of genome editing by rare-cutting designer endonucleases , 2012, Nucleic acids research.

[24]  Timothy S. Ham,et al.  Metabolic engineering of microorganisms for biofuels production: from bugs to synthetic biology to fuels. , 2008, Current opinion in biotechnology.

[25]  Y. Z. Chen,et al.  Protein function classification via support vector machine approach. , 2003, Mathematical biosciences.

[26]  B. Stoddard,et al.  Isolation and characterization of new homing endonuclease specificities at individual target site positions. , 2004, Journal of molecular biology.

[27]  P. Duchateau,et al.  A combinatorial approach to create artificial homing endonucleases cleaving chosen sequences , 2006, Nucleic acids research.

[28]  Monique Turmel,et al.  Flexible DNA target site recognition by divergent homing endonuclease isoschizomers I-CreI and I-MsoI. , 2003, Journal of molecular biology.

[29]  Barry L. Stoddard,et al.  The homing endonuclease I-CreI uses three metals, one of which is shared between the two active sites , 2001, Nature Structural Biology.

[30]  David R. Liu,et al.  Directed evolution and substrate specificity profile of homing endonuclease I-SceI. , 2006, Journal of the American Chemical Society.

[31]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[32]  Tao Jiang,et al.  OligoSpawn: a software tool for the design of overgo probes from large unigene datasets , 2006, BMC Bioinformatics.

[33]  A. Zanghellini,et al.  A novel engineered meganuclease induces homologous recombination in yeast and mammalian cells. , 2003, Nucleic acids research.

[34]  Samuel T. Edwards,et al.  Mutations altering the cleavage specificity of a homing endonuclease. , 2002, Nucleic acids research.

[35]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[36]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[37]  P. Duchateau,et al.  Context dependence between subdomains in the DNA binding interface of the I-CreI homing endonuclease , 2011, Nucleic acids research.

[38]  Jens Boch,et al.  Breaking the Code of DNA Binding Specificity of TAL-Type III Effectors , 2009, Science.

[39]  Aymeric Duclert,et al.  Generation of redesigned homing endonucleases comprising DNA-binding domains derived from two different scaffolds , 2009, Nucleic acids research.

[40]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[41]  James E. DiCarlo,et al.  RNA-Guided Human Genome Engineering via Cas9 , 2013, Science.

[42]  David Baker,et al.  Computational reprogramming of homing endonuclease specificity at multiple adjacent base pairs , 2010, Nucleic acids research.

[43]  B. Stoddard Homing endonuclease structure and function , 2005, Quarterly Reviews of Biophysics.

[44]  Jay H. Konieczka,et al.  Stepwise manipulation of DNA specificity in Flp recombinase: progressively adapting Flp to individual and combinatorial mutations in its target site. , 2003, Journal of molecular biology.

[45]  Mark Gerstein,et al.  Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data , 2003, Bioinform..

[46]  H. Berman The Protein Data Bank: a historical perspective. , 2008, Acta crystallographica. Section A, Foundations of crystallography.