An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools

ABSTRACT The CRISPR-Cas9 system has become the most promising and versatile tool for genetic manipulation applications. Albeit the technology has been broadly adopted by both academic and pharmaceutic societies, the activity (on-target) and specificity (off-target) of CRISPR-Cas9 are decisive factors for any application of the technology. Several in silico gRNA activity and specificity predicting models and web tools have been developed, making it much more convenient and precise for conducting CRISPR gene editing studies. In this review, we present an overview and comparative analysis of machine and deep learning (MDL)-based algorithms, which are believed to be the most effective and reliable methods for the prediction of CRISPR gRNA on- and off-target activities. As an increasing number of sequence features and characteristics are discovered and are incorporated into the MDL models, the prediction outcome is getting closer to experimental observations. We also introduced the basic principle of CRISPR activity and specificity and summarized the challenges they faced, aiming to facilitate the CRISPR communities to develop more accurate models for applying.

[1]  Wei Chen,et al.  Prediction of activity and specificity of CRISPR-Cpf1 using convolutional deep learning neural networks , 2019, BMC Bioinformatics.

[2]  E. Charpentier,et al.  CRISPR-Cas in Streptococcus pyogenes , 2019, RNA biology.

[3]  Wei Chen,et al.  Prediction of CRISPR sgRNA Activity Using a Deep Convolutional Neural Network , 2018, J. Chem. Inf. Model..

[4]  Xiangtao Li,et al.  Synergizing CRISPR/Cas9 off-target predictions for ensemble insights and practical applications , 2018, Bioinform..

[5]  Lin Lin,et al.  Tracking CRISPR's Footprints. , 2019, Methods in molecular biology.

[6]  Hilal Tayara,et al.  Deep Learning Models Based on Distributed Feature Representations for Alternative Splicing Prediction , 2018, IEEE Access.

[7]  Yi Zheng,et al.  CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling , 2018, Bioinform..

[8]  Ka-Chun Wong,et al.  Off-target predictions in CRISPR-Cas9 gene editing using deep learning , 2018, Bioinform..

[9]  Yi Zheng,et al.  Recognition of CRISPR/Cas9 off‐target sites through ensemble learning of uneven mismatch distributions , 2018, Bioinform..

[10]  A. McKenna,et al.  FlashFry: a fast and flexible tool for large-scale CRISPR target design , 2017, BMC Biology.

[11]  Guohui Chuai,et al.  DeepCRISPR: optimized CRISPR guide RNA design by deep learning , 2018, Genome Biology.

[12]  Jeremy Stinson,et al.  CRISPR off-target analysis in genetically engineered rats and mice , 2018, Nature Methods.

[13]  Denis C. Bauer,et al.  High Activity Target-Site Identification Using Phenotypic Independent CRISPR-Cas9 Core Functionality. , 2018, The CRISPR journal.

[14]  David R. Liu,et al.  Evolved Cas9 variants with broad PAM compatibility and high DNA specificity , 2018, Nature.

[15]  Sungroh Yoon,et al.  Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity , 2018, Nature Biotechnology.

[16]  J. L. Mateo,et al.  Refined sgRNA efficacy prediction improves large- and small-scale CRISPR–Cas9 applications , 2017, Nucleic acids research.

[17]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[18]  Itay Mayrose,et al.  A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action , 2017, PLoS Comput. Biol..

[19]  Kristopher T. Jensen,et al.  Chromatin accessibility and guide sequence secondary structure affect CRISPR‐Cas9 gene editing efficiency , 2017, FEBS letters.

[20]  Md. Khaledur Rahman,et al.  CRISPRpred: A flexible and efficient tool for sgRNAs on-target activity prediction in CRISPR/Cas9 systems , 2017, PloS one.

[21]  Bo Huang,et al.  A systematic evaluation of nucleotide properties for CRISPR sgRNA design , 2017, BMC Bioinformatics.

[22]  J. Joung,et al.  CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets , 2017, Nature Methods.

[23]  Kristopher T. Jensen,et al.  Fusion of SpCas9 to E. coli Rec A protein enhances CRISPR-Cas9 mediated gene knockout in mammalian cells. , 2017, Journal of biotechnology.

[24]  Alejandro Chavez,et al.  sgRNA Scorer 2.0: A Species-Independent Model To Predict CRISPR/Cas9 Activity. , 2017, ACS synthetic biology.

[25]  Alexander A. Sousa,et al.  Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs , 2017, Nature Biomedical Engineering.

[26]  Guohui Chuai,et al.  In Silico Meets In Vivo: Towards Computational CRISPR-Based sgRNA Design. , 2017, Trends in biotechnology.

[27]  Rainer Fischer,et al.  Detection of on-target and off-target mutations generated by CRISPR/Cas9 and other sequence-specific nucleases. , 2017, Biotechnology advances.

[28]  Česlovas Venclovas,et al.  Type III CRISPR-Cas Immunity: Major Differences Brushed Aside. , 2017, Trends in microbiology.

[29]  Yanchun Liang,et al.  Long Noncoding RNA Identification: Comparing Machine Learning Based Tools for Long Noncoding Transcripts Discrimination , 2016, BioMed research international.

[30]  Jin-Soo Kim,et al.  Structural roles of guide RNAs in the nuclease activity of Cas9 endonuclease , 2016, Nature Communications.

[31]  Akanksha Rajput,et al.  ge-CRISPR - An integrated pipeline for the prediction and analysis of sgRNAs genome editing efficiency for CRISPR/Cas system , 2016, Scientific Reports.

[32]  Eric S. Lander,et al.  C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector , 2016, Science.

[33]  J. Kent,et al.  Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR , 2016, Genome Biology.

[34]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[35]  Pritish Kumar Varadwaj,et al.  DeepLNC, a long non-coding RNA prediction tool using deep neural network , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.

[36]  Kira S. Makarova,et al.  Crystal Structure of Cpf1 in Complex with Guide RNA and Target DNA , 2016, Cell.

[37]  M. Jinek,et al.  Structural Plasticity of PAM Recognition by Engineered Variants of the RNA-Guided Endonuclease Cas9. , 2016, Molecular cell.

[38]  Jennifer A. Doudna,et al.  Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage , 2016, Science.

[39]  Jin-Soo Kim,et al.  Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq , 2016, Genome research.

[40]  L. Bolund,et al.  Enhanced genome editing in mammalian cells with a modified dual-fluorescent surrogate system , 2016, Cellular and Molecular Life Sciences.

[41]  Ciaran M Lee,et al.  Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing , 2016, Molecular therapy : the journal of the American Society of Gene Therapy.

[42]  J. Joung,et al.  High-fidelity CRISPR-Cas9 variants with undetectable genome-wide off-targets , 2015, Nature.

[43]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.

[44]  Koji Kashihara,et al.  Automatic design of an effective image filter based on an evolutionary algorithm for venous analysis , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.

[45]  Jin-Soo Kim,et al.  Cas-Designer: a web-based tool for choice of CRISPR-Cas9 target sites , 2015, Bioinform..

[46]  Xiao-Hui Zhang,et al.  Off-target Effects in CRISPR/Cas9-mediated Genome Engineering , 2015, Molecular therapy. Nucleic acids.

[47]  Mazhar Adli,et al.  Cas9-chromatin binding information enables more accurate CRISPR off-target prediction , 2015, Nucleic acids research.

[48]  R. Tarleton,et al.  EuPaGDT: a web tool tailored to design CRISPR guide RNAs for eukaryotic pathogens , 2015, Microbial genomics.

[49]  Xiaowei Wang,et al.  WU-CRISPR: characteristics of functional guide RNAs for the CRISPR/Cas9 system , 2015, Genome Biology.

[50]  Clifford A. Meyer,et al.  Sequence determinants of improved CRISPR sgRNA design , 2015, Genome research.

[51]  Charles E. Vejnar,et al.  CRISPRscan: designing highly efficient sgRNAs for CRISPR/Cas9 targeting in vivo , 2015, Nature Methods.

[52]  G. Church,et al.  Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach , 2015, Nature Methods.

[53]  Martin J. Aryee,et al.  Engineered CRISPR-Cas9 nucleases with altered PAM specificities , 2015, Nature.

[54]  J. Doudna,et al.  Expanding the Biologist's Toolkit with CRISPR-Cas9. , 2015, Molecular cell.

[55]  István Reményi,et al.  CCTOP: a Consensus Constrained TOPology prediction web server , 2015, Nucleic Acids Res..

[56]  David A. Scott,et al.  In vivo genome editing using Staphylococcus aureus Cas9 , 2015, Nature.

[57]  Jong-il Kim,et al.  Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells , 2015, Nature Methods.

[58]  Randall J. Platt,et al.  Therapeutic genome editing: prospects and challenges , 2015, Nature Medicine.

[59]  Xiaoling Wang,et al.  Unbiased detection of off-target cleavage by CRISPR-Cas9 and TALENs using integrase-defective lentiviral vectors , 2015, Nature Biotechnology.

[60]  Martin J. Aryee,et al.  GUIDE-Seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases , 2014, Nature Biotechnology.

[61]  Richard L. Frock,et al.  Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases , 2014, Nature Biotechnology.

[62]  Yanhui Hu,et al.  Enhanced specificity and efficiency of the CRISPR/Cas9 system with optimized sgRNA parameters in Drosophila. , 2014, Cell reports.

[63]  L. Zhu,et al.  CRISPRseek: A Bioconductor Package to Identify Target-Specific Guide RNAs for CRISPR-Cas9 Genome-Editing Systems , 2014, PloS one.

[64]  Meagan E. Sullender,et al.  Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation , 2014, Nature Biotechnology.

[65]  Yang Lei,et al.  CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants. , 2014, Molecular plant.

[66]  Vincent J. Henry,et al.  OMICtools: an informative directory for multi-omic data analysis , 2014, Database J. Biol. Databases Curation.

[67]  M. Jinek,et al.  Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease , 2014, Nature.

[68]  Neville E. Sanjana,et al.  Improved vectors and genome-wide libraries for CRISPR screening , 2014, Nature Methods.

[69]  Tautvydas Karvelis,et al.  Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes , 2014, Proceedings of the National Academy of Sciences.

[70]  George M. Church,et al.  CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing , 2014, Nucleic Acids Res..

[71]  Gang Bao,et al.  CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences , 2014, Nucleic acids research.

[72]  David R. Liu,et al.  Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification , 2014, Nature Biotechnology.

[73]  Martin J. Aryee,et al.  Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing , 2014, Nature Biotechnology.

[74]  Jennifer A. Doudna,et al.  Structures of Cas9 Endonucleases Reveal RNA-Mediated Conformational Activation , 2014, Science.

[75]  Feng Zhang,et al.  Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA , 2014, Cell.

[76]  C. Rubinstein,et al.  Highly Specific and Efficient CRISPR/Cas9-Catalyzed Homology-Directed Repair in Drosophila , 2014, Genetics.

[77]  J. Keith Joung,et al.  Improving CRISPR-Cas nuclease specificity using truncated guide RNAs , 2014, Nature Biotechnology.

[78]  Jin-Soo Kim,et al.  Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases , 2014, Bioinform..

[79]  Jin-Soo Kim,et al.  Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases , 2014, Genome research.

[80]  David A. Scott,et al.  Genome engineering using the CRISPR-Cas9 system , 2013, Nature Protocols.

[81]  David R. Liu,et al.  High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity , 2013, Nature Biotechnology.

[82]  Jun Li,et al.  Targeted genome modification of crop plants using a CRISPR-Cas system , 2013, Nature Biotechnology.

[83]  Eli J. Fine,et al.  DNA targeting specificity of RNA-guided Cas9 nucleases , 2013, Nature Biotechnology.

[84]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[85]  Le Cong,et al.  Multiplex Genome Engineering Using CRISPR/Cas Systems , 2013, Science.

[86]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[87]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[88]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[89]  R. Barrangou,et al.  Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria , 2012, Proceedings of the National Academy of Sciences.

[90]  J. Doudna,et al.  A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity , 2012, Science.

[91]  Yongchao Liu,et al.  CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform , 2012, Bioinform..

[92]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[93]  J. Vogel,et al.  CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III , 2011, Nature.

[94]  R. Barrangou,et al.  CRISPR/Cas, the Immune System of Bacteria and Archaea , 2010, Science.

[95]  Eugene V Koonin,et al.  CRISPR-Cas: an adaptive immunity system in prokaryotes , 2009, F1000 biology reports.

[96]  Xiaohui Wang,et al.  Selection of hyperfunctional siRNAs with improved potency and specificity , 2009, Nucleic acids research.

[97]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[98]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[99]  J. García-Martínez,et al.  Short motif sequences determine the targets of the prokaryotic CRISPR defence system. , 2009, Microbiology.

[100]  J. García-Martínez,et al.  Short motif sequences determine the targets of the prokaryotic CRISPR defence system. , 2009, Microbiology.

[101]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[102]  Philippe Horvath,et al.  Phage Response to CRISPR-Encoded Resistance in Streptococcus thermophilus , 2007, Journal of bacteriology.

[103]  C. V. Jongeneel,et al.  Indexing Strategies for Rapid Searches of Short Words in Genome Sequences , 2007, PloS one.

[104]  Y. Li,et al.  Incorporating structure to predict microRNA targets. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[105]  J. García-Martínez,et al.  Intervening Sequences of Regularly Spaced Prokaryotic Repeats Derive from Foreign Genetic Elements , 2005, Journal of Molecular Evolution.

[106]  Simon Cawley,et al.  HMM sampling and applications to gene finding and alternative splicing , 2003, ECCB.