Predicting off-target effects for end-to-end CRISPR guide design

The CRISPR-Cas9 system provides unprecedented genome editing capabilities. However, off-target effects lead to sub-optimal usage and additionally are a bottleneck in development of therapeutic uses. Herein, we introduce the first machine learning-based approach to this problem, yielding a state-of-the-art predictive model for CRISPR-Cas9 off-target effects which outperforms all other guide design services. Our approach, Elevation, consists of two inter-related machine learning models—one for scoring individual guide-target pairs and another which aggregates guide-target scores into a single, overall guide summary score. Through systematic investigation, we demonstrate that Elevation performs substantially better than competing approaches on both of these tasks. Additionally, we are the first to systematically evaluate approaches on the guide summary score problem; we show that the most widely-used method (and one re-implemented by several other servers) performs no better than random at times, whereas Elevation consistently outperformed it, sometimes by an order of magnitude. In our analyses, we also introduce a method to balance errors on truly active guides with those which are truly inactive, encapsulating a range of practical use cases, thereby showing that Elevation is consistently superior across the entire range. We thus contribute a new evaluation metric for benchmarking off-target modeling. Finally, because of the large computational demands of our tasks, we have developed a cloud-based service for end-to-end guide design which incorporates our previously reported on-target model, Azimuth, as well as our new off-target model, Elevation.

[1]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[2]  Ricardo A. Baeza-Yates,et al.  Fast and Practical Approximate String Matching , 1992, Inf. Process. Lett..

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Leonard E. Trigg,et al.  Naive Bayes for regression , 1998 .

[7]  Leonard E. Trigg,et al.  Technical Note: Naive Bayes for Regression , 2000, Machine Learning.

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Eli J. Fine,et al.  DNA targeting specificity of RNA-guided Cas9 nucleases , 2013, Nature Biotechnology.

[10]  E. Lander,et al.  Genetic Screens in Human Cells Using the CRISPR-Cas9 System , 2013, Science.

[11]  M. Boutros,et al.  E-CRISP: fast CRISPR target site identification , 2014, Nature Methods.

[12]  Meagan E. Sullender,et al.  Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation , 2014, Nature Biotechnology.

[13]  Jin-Soo Kim,et al.  Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases , 2014, Bioinform..

[14]  Mazhar Adli,et al.  Cas9-chromatin binding information enables more accurate CRISPR off-target prediction , 2015, Nucleic acids research.

[15]  G. Church,et al.  Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach , 2015, Nature Methods.

[16]  Xiaoling Wang,et al.  Unbiased detection of off-target cleavage by CRISPR-Cas9 and TALENs using integrase-defective lentiviral vectors , 2015, Nature Biotechnology.

[17]  Clifford A. Meyer,et al.  Sequence determinants of improved CRISPR sgRNA design , 2015, Genome research.

[18]  Martin J. Aryee,et al.  GUIDE-Seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases , 2014, Nature Biotechnology.

[19]  Charles E. Vejnar,et al.  CRISPRscan: designing highly efficient sgRNAs for CRISPR/Cas9 targeting in vivo , 2015, Nature Methods.

[20]  J. L. Mateo,et al.  CCTop: An Intuitive, Flexible and Reliable CRISPR/Cas9 Target Prediction Tool , 2015, PloS one.

[21]  Jong-il Kim,et al.  Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells , 2015, Nature Methods.

[22]  N. Perrimon,et al.  Identification of potential drug targets for tuberous sclerosis complex by synthetic screens combining CRISPR-based knockouts with RNAi , 2015, Science Signaling.

[23]  Kornel Labun,et al.  CHOPCHOP v2: a web tool for the next generation of CRISPR genome engineering , 2016, Nucleic Acids Res..

[24]  Jin-Soo Kim,et al.  Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq , 2016, Genome research.

[25]  Jin-Soo Kim,et al.  Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells , 2016, Nature Biotechnology.

[26]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.

[27]  T. Golub,et al.  Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting. , 2016, Cancer discovery.

[28]  J. Kent,et al.  Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR , 2016, Genome Biology.

[29]  Wei Li,et al.  CRISPR-DO for genome-wide CRISPR design and optimization , 2016, Bioinform..

[30]  Correction: CCTop: An Intuitive, Flexible and Reliable CRISPR/Cas9 Target Prediction Tool , 2017, PloS one.