cDeepbind: A context sensitive deep learning model of RNA-protein binding

Motivation Determining RNA binding protein(RBP) binding specificity is crucial for understanding many cellular processes and genetic disorders. RBP binding is known to be affected by both the sequence and structure of RNAs. Deep learning can be used to learn generalizable representations of raw data and has improved state of the art in several fields such as image classification, speech recognition and even genomics. Previous work on RBP binding has either used shallow models that combine sequence and structure or deep models that use only the sequence. Here we combine both abilities by augmenting and refining the original Deepbind architecture to capture structural information and obtain significantly better performance. Results We propose two deep architectures, one a lightweight convolutional network for transcriptome wide inference and another a Long Short-Term Memory(LSTM) network that is suitable for small batches of data. We incorporate computationally predicted secondary structure features as input to our models and show its effectiveness in boosting prediction performance. Our models achieved significantly higher correlations on held out in-vitro test data compared to previous approaches, and generalise well to in-vivo CLIP-SEQ data achieving higher median AUCs than other approaches. We analysed the output from our model for VTS1 and CPO and provided intuition into its working. Our models confirmed known secondary structure preferences for some proteins as well as found new ones where secondary structure might play a role. We also demonstrated the strengths of our model compared to other approaches such as the ability to combine information from long distances along the input. Availability Software and models are available at https://github.com/shreshthgandhi/cDeepbind Contact ljlee@psi.toronto.edu, frey@psi.toronto.edu

[1]  J. Szostak,et al.  In vitro selection of RNA molecules that bind specific ligands , 1990, Nature.

[2]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  K. Musunuru Cell-specific RNA-binding proteins in human disease. , 2003, Trends in cardiovascular medicine.

[5]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[6]  M. Gorospe,et al.  Identification of a target RNA motif for RNA-binding protein HuR. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  C. Lawrence,et al.  RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. , 2005, RNA.

[8]  P. Stadler,et al.  The effect of RNA secondary structures on RNA-ligand binding and the modifier RNA mechanism: a quantitative model. , 2005, Gene.

[9]  Tzvi Aviv,et al.  Sequence-specific recognition of RNA hairpins by the SAM domain of Vts1p , 2006, Nature Structural &Molecular Biology.

[10]  C. Clerté,et al.  Characterization of multimeric complexes formed by the human PTB1 protein on RNA. , 2006, RNA.

[11]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[12]  R. Stoltenburg,et al.  SELEX--a (r)evolutionary method to generate high-affinity nucleic acid ligands. , 2007, Biomolecular engineering.

[13]  Kai-Wei Chang,et al.  RNA-binding proteins in human genetic disease. , 2008, Trends in genetics : TIG.

[14]  Tyson A. Clark,et al.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing , 2008, Nature.

[15]  S. Keleş,et al.  A single C. elegans PUF protein binds RNA in multiple modes. , 2009, RNA.

[16]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[17]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[18]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[19]  J. Ule,et al.  iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution , 2010, Nature Structural &Molecular Biology.

[20]  M. Zavolan,et al.  A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins , 2011, Nature Methods.

[21]  G. M. Wilson,et al.  Different modes of interaction by TIAR and HuR with target RNA and DNA , 2011, Nucleic acids research.

[22]  Peter Johnson,et al.  Prediction of single‐nucleotide substitutions that result in exon skipping: identification of a splicing silencer in BRCA1 exon 6 , 2011, Human mutation.

[23]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[24]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[25]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[26]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[27]  F. Allain,et al.  Molecular basis for the wide range of affinity found in Csr/Rsm protein–RNA recognition , 2014, Nucleic acids research.

[28]  Manolis Kellis,et al.  Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo , 2013, Nature.

[29]  P. Sharp,et al.  RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. , 2014, Molecular cell.

[30]  M. Ares,et al.  Context-dependent control of alternative splicing by RNA-binding proteins , 2014, Nature Reviews Genetics.

[31]  Q. Morris,et al.  Finding the target sites of RNA-binding proteins , 2013, Wiley interdisciplinary reviews. RNA.

[32]  T. Tuschl,et al.  Structural basis underlying CAC RNA recognition by the RRM domain of dimeric RNA-binding protein RBPMS , 2015, Quarterly Reviews of Biophysics.

[33]  C. Burge,et al.  RNA Bind-n-Seq: Measuring the Binding Affinity Landscape of RNA-Binding Proteins. , 2015, Methods in enzymology.

[34]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[35]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Robert Giegerich,et al.  The RNA shapes studio , 2014, Bioinform..

[38]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[39]  Howard Y. Chang,et al.  Transcriptome-wide interrogation of RNA secondary structure in living cells with icSHAPE , 2016, Nature Protocols.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  C. Bond,et al.  Determinants of affinity and specificity in RNA-binding proteins. , 2016, Current opinion in structural biology.

[42]  Bonnie Berger,et al.  RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data , 2016, Bioinform..

[43]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[44]  Peter H. Sudmant,et al.  RNA Sequence Context Effects Measured In Vitro Predict In Vivo Protein Binding and Regulation. , 2016, Molecular cell.

[45]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[48]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[49]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[50]  Kaitlin U Laverty,et al.  RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection. , 2017, Methods.

[51]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[52]  Benny Chor,et al.  A Deep Learning Approach for Learning Intrinsic Protein-RNA Binding Preferences , 2018, bioRxiv.

[53]  Gene W. Yeo,et al.  Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA , 2018, Nature Communications.

[54]  3D based on 2D: Calculating helix angles and stacking patterns using forgi 2.0, an RNA Python library centered on secondary structure elements. , 2019, F1000Research.

[55]  Alexander G. B. Grønning,et al.  DeepCLIP: predicting the effect of mutations on protein–RNA binding with deep learning , 2019, bioRxiv.

[56]  Uwe Ohler,et al.  Deep neural networks for interpreting RNA-binding protein target preferences , 2019, bioRxiv.

[57]  Chao Lu,et al.  DMfold: A Novel Method to Predict RNA Secondary Structure With Pseudoknots Based on Deep Learning and Improved Base Pair Maximization Principle , 2019, Front. Genet..

[58]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.