A Novel Hybrid CNN-SVR for CRISPR/Cas9 Guide RNA Activity Prediction

Accurate prediction of guide RNA (gRNA) on-target efficacy is critical for effective application of CRISPR/Cas9 system. Although some machine learning-based and convolutional neural network (CNN)-based methods have been proposed, prediction accuracy remains to be improved. Here, firstly we improved architectures of current CNNs for predicting gRNA on-target efficacy. Secondly, we proposed a novel hybrid system which combines our improved CNN with support vector regression (SVR). This CNN-SVR system is composed of two major components: a merged CNN as the front-end for extracting gRNA feature and an SVR as the back-end for regression and predicting gRNA cleavage efficiency. We demonstrate that CNN-SVR can effectively exploit features interactions from feed-forward directions to learn deeper features of gRNAs and their corresponding epigenetic features. Experiments on commonly used datasets show that our CNN-SVR system outperforms available state-of-the-art methods in terms of prediction accuracy, generalization, and robustness. Source codes are available at https://github.com/Peppags/CNN-SVR.

[1]  M. Mukaka,et al.  Statistics corner: A guide to appropriate use of correlation coefficient in medical research. , 2012, Malawi medical journal : the journal of Medical Association of Malawi.

[2]  Martin J. Aryee,et al.  Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells , 2016, Nature Biotechnology.

[3]  Clifford A. Meyer,et al.  Sequence determinants of improved CRISPR sgRNA design , 2015, Genome research.

[4]  E. Lander,et al.  Genetic Screens in Human Cells Using the CRISPR-Cas9 System , 2013, Science.

[5]  Le Song,et al.  Poly(A) motif prediction using spectral latent features from human DNA sequences , 2013, Bioinform..

[6]  Guohui Chuai,et al.  In Silico Meets In Vivo: Towards Computational CRISPR-Based sgRNA Design. , 2017, Trends in biotechnology.

[7]  Xiaowei Wang,et al.  WU-CRISPR: characteristics of functional guide RNAs for the CRISPR/Cas9 system , 2015 .

[8]  Qiang Wu,et al.  Precise and Predictable CRISPR Chromosomal Rearrangements Reveal Principles of Cas9-Mediated Nucleotide Insertion. , 2018, Molecular cell.

[9]  Le Cong,et al.  Multiplex Genome Engineering Using CRISPR/Cas Systems , 2013, Science.

[10]  Jie Li,et al.  No-reference image quality assessment based on hybrid model , 2017, Signal Image Video Process..

[11]  Max A. Horlbeck,et al.  Author response: Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation , 2016 .

[12]  Meagan E. Sullender,et al.  Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation , 2014, Nature Biotechnology.

[13]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Guohui Chuai,et al.  DeepCRISPR: optimized CRISPR guide RNA design by deep learning , 2018, Genome Biology.

[17]  Itay Mayrose,et al.  A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action , 2017, PLoS Comput. Biol..

[18]  G. Church,et al.  Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach , 2015, Nature Methods.

[19]  Jin-Wu Nam,et al.  In vivo high-throughput profiling of CRISPR–Cpf1 activity , 2016, Nature Methods.

[20]  Jennifer Listgarten,et al.  Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs , 2018, Nature Biomedical Engineering.

[21]  Jialiang Yang,et al.  Identify Key Sequence Features to Improve CRISPR sgRNA Efficacy , 2017, IEEE Access.

[22]  Denis C. Bauer,et al.  The Current State and Future of CRISPR-Cas9 gRNA Design Tools , 2018, Front. Pharmacol..

[23]  Hazem M. Hajj,et al.  A hybrid approach with collaborative filtering for recommender systems , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[24]  Jin-Soo Kim,et al.  Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells , 2016, Nature Biotechnology.

[25]  George M. Church,et al.  CasFinder: Flexible algorithm for identifying specific Cas9 targets in genomes , 2014, bioRxiv.

[26]  R. Barrangou,et al.  CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes , 2007, Science.

[27]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[28]  Ajith Abraham,et al.  An ensemble of neural networks for weather forecasting , 2004, Neural Computing & Applications.

[29]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.

[30]  Sungroh Yoon,et al.  Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity , 2018, Nature Biotechnology.

[31]  Charles E. Vejnar,et al.  CRISPRscan: designing highly efficient sgRNAs for CRISPR/Cas9 targeting in vivo , 2015, Nature Methods.

[32]  Ching Y. Suen,et al.  A novel hybrid CNN-SVM classifier for recognizing handwritten digits , 2012, Pattern Recognit..

[33]  Phillip M. Cheng,et al.  Transfer Learning with Convolutional Neural Networks for Classification of Abdominal Ultrasound Images , 2017, Journal of Digital Imaging.

[34]  Max A. Horlbeck,et al.  Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation , 2016, eLife.

[35]  Eli J. Fine,et al.  DNA targeting specificity of RNA-guided Cas9 nucleases , 2013, Nature Biotechnology.

[36]  David R. Liu,et al.  High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity , 2013, Nature Biotechnology.

[37]  D. Durocher,et al.  High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities , 2015, Cell.

[38]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[39]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[40]  Yann LeCun,et al.  Large-scale Learning with SVM and Convolutional for Generic Object Categorization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[42]  John J. Wyrick,et al.  Nucleosomes Inhibit Cas9 Endonuclease Activity in Vitro. , 2015, Biochemistry.

[43]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[44]  G. Hannon,et al.  A CRISPR Resource for Individual, Combinatorial, or Multiplexed Gene Knockout , 2017, Molecular cell.

[45]  Xu Shuo,et al.  Protein secondary structure prediction based on SVM , 2010 .

[46]  J. Doudna,et al.  A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity , 2012, Science.

[47]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[48]  Oh-Jin Kwon No-Reference Image Quality Metric for Image Fusion , 2015 .

[49]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .