Recognition of CRISPR/Cas9 off‐target sites through ensemble learning of uneven mismatch distributions

Motivation CRISPR/Cas9 is driving a broad range of innovative applications from basic biology to biotechnology and medicine. One of its current issues is the effect of off‐target editing that should be critically resolved and should be completely avoided in the ideal use of this system. Results We developed an ensemble learning method to detect the off‐target sites of a single guide RNA (sgRNA) from its thousands of genome‐wide candidates. Nucleotide mismatches between on‐target and off‐target sites have been studied recently. We confirm that there exists strong mismatch enrichment and preferences at the 5′‐end close regions of the off‐target sequences. Comparing with the on‐target sites, sequences of no‐editing sites can be also characterized by GC composition changes and position‐specific mismatch binary features. Under this novel space of features, an ensemble strategy was applied to train a prediction model. The model achieved a mean score 0.99 of Aera Under Receiver Operating Characteristic curve and a mean score 0.45 of Aera Under Precision‐Recall curve in cross‐validations on big datasets, outperforming state‐of‐the‐art methods in various test scenarios. Our predicted off‐target sites also correspond very well to those detected by high‐throughput sequencing techniques. Especially, two case studies for selecting sgRNAs to cure hearing loss and retinal degeneration partly prove the effectiveness of our method. Availability and implementation The python and matlab version of source codes for detecting off‐target sites of a given sgRNA and the supplementary files are freely available on the web at https://github.com/penn‐hui/OfftargetPredict. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  J. Kent,et al.  Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR , 2016, Genome Biology.

[2]  G. Lin,et al.  Potential pitfalls of CRISPR/Cas9‐mediated genome editing , 2016, The FEBS journal.

[3]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[4]  Gang Bao,et al.  The Neisseria meningitidis CRISPR-Cas9 System Enables Specific Genome Editing in Mammalian Cells , 2016, Molecular therapy : the journal of the American Society of Gene Therapy.

[5]  J. Keith Joung,et al.  High frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells , 2013, Nature Biotechnology.

[6]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.

[7]  E. Lander,et al.  Development and Applications of CRISPR-Cas9 for Genome Engineering , 2014, Cell.

[8]  Xiaoling Wang,et al.  Unbiased detection of off-target cleavage by CRISPR-Cas9 and TALENs using integrase-defective lentiviral vectors , 2015, Nature Biotechnology.

[9]  Eli J. Fine,et al.  DNA targeting specificity of RNA-guided Cas9 nucleases , 2013, Nature Biotechnology.

[10]  Jin-Soo Kim,et al.  Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases , 2014, Bioinform..

[11]  Michael C. Bassik,et al.  CRISPR-Cas9 screens in human cells and primary neurons identify modifiers of C9orf72 dipeptide repeat protein toxicity , 2017, bioRxiv.

[12]  Jin-Soo Kim,et al.  Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq , 2016, Genome research.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Jin-Soo Kim,et al.  Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases , 2014, Genome research.

[15]  J. Joung,et al.  CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets , 2017, Nature Methods.

[16]  A. Fire,et al.  Distinct patterns of Cas9 mismatch tolerance in vitro and in vivo , 2016, Nucleic acids research.

[17]  I. Korf,et al.  GC skew at the 5′ and 3′ ends of human genes links R-loop formation to epigenetic regulation and transcription termination , 2013, Genome research.

[18]  J. Doudna,et al.  Expanding the Biologist's Toolkit with CRISPR-Cas9. , 2015, Molecular cell.

[19]  Qiaobing Xu,et al.  Treatment of autosomal dominant hearing loss by in vivo delivery of genome editing agents , 2017, Nature.

[20]  Martin J. Aryee,et al.  GUIDE-Seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases , 2014, Nature Biotechnology.

[21]  Jennifer Listgarten,et al.  Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs , 2018, Nature Biomedical Engineering.

[22]  Yanhui Hu,et al.  Enhanced specificity and efficiency of the CRISPR/Cas9 system with optimized sgRNA parameters in Drosophila. , 2014, Cell reports.

[23]  David R. Liu,et al.  High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity , 2013, Nature Biotechnology.

[24]  Richard L. Frock,et al.  Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases , 2014, Nature Biotechnology.

[25]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[26]  Won-Bin Young,et al.  In Vivo Excision of HIV-1 Provirus by saCas9 and Multiplex Single-Guide RNAs in Animal Models. , 2017, Molecular therapy : the journal of the American Society of Gene Therapy.

[27]  Vijender Chaitankar,et al.  Nrl knockdown by AAV-delivered CRISPR/Cas9 prevents retinal degeneration in mice , 2017, Nature Communications.

[28]  David A. Scott,et al.  In vivo genome editing using Staphylococcus aureus Cas9 , 2015, Nature.

[29]  Kendall R. Sanson,et al.  Orthologous CRISPR-Cas9 enzymes for Combinatorial Genetic Screens , 2017, Nature Biotechnology.

[30]  Adam Akkad,et al.  Colonoscopy-based colorectal cancer modeling in mice with CRISPR–Cas9 genome editing and organoid transplantation , 2018, Nature Protocols.

[31]  Feng Zhang,et al.  In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9 , 2014, Nature Biotechnology.

[32]  Mazhar Adli,et al.  Cas9-chromatin binding information enables more accurate CRISPR off-target prediction , 2015, Nucleic acids research.

[33]  J. Keith Joung,et al.  731. High-Fidelity CRISPR-Cas9 Nucleases with No Detectable Genome-Wide Off-Target Effects , 2016 .

[34]  Itay Mayrose,et al.  A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action , 2017, PLoS Comput. Biol..

[35]  E. Lander,et al.  Genetic Screens in Human Cells Using the CRISPR-Cas9 System , 2013, Science.

[36]  Jong-il Kim,et al.  Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells , 2015, Nature Methods.

[37]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[38]  Jennifer A. Doudna,et al.  Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage , 2016, Science.