CRISPR-GNL: an improved model for predicting CRISPR activity by machine learning and featurization

Motivation The CRISPR/Cas9 system has been broadly used in genetic engineering. However, risks of potential off-targets and the variability of on-target activity among different targets are two limiting factors. Several bioinformatic tools have been developed for CRISPR on-target activity and off-target prediction. However, the general application of the current prediction models is hampered by the great variation among different algorithms. Results In this study, we thoroughly re-analyzed 13 published datasets with eight regression models. We proved that the current model gave very low cross-dataset and cross-species prediction outcome. To overcome these limitations, we have developed an improved model (a generalization score, GNL) based on normalized gene editing activity from 8,101 gRNAs and 2,488 features using Bayesian Ridge Regression model. Our results demonstrated that the GNL model is a better general algorithm for CRISPR on-target activity prediction Availability and implementation The prediction scorer is available on GitHub (https://github.com/TerminatorJ/GNL_Scorer). Contact J.W. (wangjun6@genomics.cn) or Y.L. (luoyonglun@genomics.cn) Supplementary Information Supplementary data are available at Bioinformatics online.

[1]  Qiang Sun,et al.  Homology-mediated end joining-based targeted integration using CRISPR/Cas9 , 2017, Cell Research.

[2]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[3]  Kuldip K. Paliwal,et al.  Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins , 2016, Bioinform..

[4]  R. Barrangou,et al.  CRISPR/Cas, the Immune System of Bacteria and Archaea , 2010, Science.

[5]  J. L. Mateo,et al.  Refined sgRNA efficacy prediction improves large- and small-scale CRISPR–Cas9 applications , 2017, Nucleic acids research.

[6]  J. Kent,et al.  Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR , 2016, Genome Biology.

[7]  Sungroh Yoon,et al.  Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity , 2018, Nature Biotechnology.

[8]  Denis C. Bauer,et al.  High Activity Target-Site Identification Using Phenotypic Independent CRISPR-Cas9 Core Functionality. , 2018, The CRISPR journal.

[9]  Yilong Li,et al.  Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library , 2013, Nature Biotechnology.

[10]  Charles E. Vejnar,et al.  CRISPRscan: designing highly efficient sgRNAs for CRISPR/Cas9 targeting in vivo , 2015, Nature Methods.

[11]  Meagan E. Sullender,et al.  Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation , 2014, Nature Biotechnology.

[12]  Jing Yang,et al.  Benchmarking CRISPR on‐target sgRNA design , 2018, Briefings Bioinform..

[13]  Wei Chen,et al.  Prediction of CRISPR sgRNA Activity Using a Deep Convolutional Neural Network , 2018, J. Chem. Inf. Model..

[14]  Yi Zheng,et al.  CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling , 2018, Bioinform..

[15]  J. Doudna,et al.  A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity , 2012, Science.

[16]  Yang Lei,et al.  CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants. , 2014, Molecular plant.

[17]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[18]  John G Doench,et al.  In Silico Predictive Modeling of CRISPR/Cas9 guide efficiency , 2015, bioRxiv.

[19]  Zhen Xie,et al.  Improved sgRNA design in bacteria via genome-wide activity profiling , 2018, bioRxiv.

[20]  Yanhui Hu,et al.  Enhanced specificity and efficiency of the CRISPR/Cas9 system with optimized sgRNA parameters in Drosophila. , 2014, Cell reports.

[21]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[22]  Ka-Chun Wong,et al.  Off-target predictions in CRISPR-Cas9 gene editing using deep learning , 2018, Bioinform..

[23]  Guohui Chuai,et al.  DeepCRISPR: optimized CRISPR guide RNA design by deep learning , 2018, Genome Biology.

[24]  J. Doudna,et al.  CRISPR-Cas9 Structures and Mechanisms. , 2017, Annual review of biophysics.

[25]  Houxiang Zhu,et al.  CRISPR-DT: designing gRNAs for the CRISPR-Cpf1 system with improved target efficiency and specificity , 2018, bioRxiv.

[26]  Clifford A. Meyer,et al.  Sequence determinants of improved CRISPR sgRNA design , 2015, Genome research.

[27]  M. Haeussler,et al.  Evaluation and rational design of guide RNAs for efficient CRISPR/Cas9-mediated mutagenesis in Ciona , 2016, bioRxiv.

[28]  Kevin Bishop,et al.  High-throughput gene targeting and phenotyping in zebrafish using CRISPR/Cas9 , 2015, Genome research.

[29]  C. Rubinstein,et al.  Highly Specific and Efficient CRISPR/Cas9-Catalyzed Homology-Directed Repair in Drosophila , 2014, Genetics.

[30]  D. Durocher,et al.  High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities , 2015, Cell.

[31]  Jialiang Yang,et al.  Identify Key Sequence Features to Improve CRISPR sgRNA Efficacy , 2017, IEEE Access.

[32]  J. Vogel,et al.  CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III , 2011, Nature.

[33]  L. Bolund,et al.  Enhanced genome editing in mammalian cells with a modified dual-fluorescent surrogate system , 2016, Cellular and Molecular Life Sciences.

[34]  Tessa G. Montague,et al.  Efficient Mutagenesis by Cas9 Protein-Mediated Oligonucleotide Insertion and Large-Scale Assessment of Single-Guide RNAs , 2014, PloS one.

[35]  Kenneth A Johnson,et al.  DNA Unwinding Is the Primary Determinant of CRISPR-Cas9 Activity. , 2018, Cell reports.

[36]  B. Meyer,et al.  Dramatic Enhancement of Genome Editing by CRISPR/Cas9 Through Improved Guide RNA Design , 2015, Genetics.

[37]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[38]  David A. Scott,et al.  Genome engineering using the CRISPR-Cas9 system , 2013, Nature Protocols.

[39]  Guohui Chuai,et al.  In Silico Meets In Vivo: Towards Computational CRISPR-Based sgRNA Design. , 2017, Trends in biotechnology.

[40]  E. Lander,et al.  Genetic Screens in Human Cells Using the CRISPR-Cas9 System , 2013, Science.

[41]  G. Church,et al.  Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach , 2015, Nature Methods.

[42]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Kristopher T. Jensen,et al.  Chromatin accessibility and guide sequence secondary structure affect CRISPR‐Cas9 gene editing efficiency , 2017, FEBS letters.

[44]  Eugene V Koonin,et al.  CRISPR-Cas: an adaptive immunity system in prokaryotes , 2009, F1000 biology reports.

[45]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.