Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks

RNA regulation is significantly dependent on its binding protein partner, which is known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well characterized, especially on the structure point of view. Informative signals hiding and interdependencies between sequence and structure specificities are two challenging problems for both predicting RBP binding sites and accurate sequence and structure motifs mining. In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, which are appropriate for subsequent convolution operations. To reveal the hidden binding knowledge from the observations, the CNNs are applied to learn the abstract motif features. Considering the close relationship between sequences and predicted structures, we use the BLSTM to capture the long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets, and the results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage is that iDeepS is able to automatically extract both binding sequence and structure motifs, which will improve our transparent understanding of the mechanisms of binding specificities of RBPs. iDeepS is available at https://github.com/xypan1232/iDeepS.

[1]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[2]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[3]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[6]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7]  T. Hughes,et al.  Identifying mRNA sequence elements for target recognition by human Argonaute proteins , 2014, Genome research.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Marinka Zitnik,et al.  Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins , 2016, Bioinform..

[10]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[11]  Chris Sander,et al.  RNA targets of wild-type and mutant FET family proteins , 2011, Nature Structural &Molecular Biology.

[12]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[13]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[14]  Grace X. Y. Zheng,et al.  Genome-wide identification of Ago2 binding sites from mouse embryonic stem cells with and without mature microRNAs , 2010, Nature Structural &Molecular Biology.

[15]  Kate B. Cook,et al.  RBPDB: a database of RNA-binding specificities , 2010, Nucleic Acids Res..

[16]  Vincenzo Silani,et al.  TDP-43 and FUS RNA-binding Proteins Bind Distinct Sets of Cytoplasmic Messenger RNAs and Differently Regulate Their Post-transcriptional Fate in Motoneuron-like Cells* , 2012, The Journal of Biological Chemistry.

[17]  Hong-Bin Shen,et al.  IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction , 2016, BMC Genomics.

[18]  Alessio Colantoni,et al.  Revealing protein–lncRNA interaction , 2015, Briefings Bioinform..

[19]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[20]  Quaid Morris,et al.  Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. , 2010, RNA.

[21]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[22]  Jianyang Zeng,et al.  A deep learning framework for modeling structural features of RNA-binding protein targets , 2015, Nucleic acids research.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Sean P Ryder,et al.  Specificity of the STAR/GSG domain protein Qk1: implications for the regulation of myelination. , 2004, RNA.

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[27]  P. Stadler,et al.  The effect of RNA secondary structures on RNA-ligand binding and the modifier RNA mechanism: a quantitative model. , 2005, Gene.

[28]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[31]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[32]  Enrico Blanzieri,et al.  Protein-specific prediction of mRNA binding using RNA sequences, binding motifs and predicted secondary structures , 2014, BMC Bioinformatics.

[33]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[34]  Kiyoshi Asai,et al.  CapR: revealing structural specificities of RNA-binding protein target recognition using CLIP-seq data , 2014, Genome Biology.

[35]  M. Hiller,et al.  Using RNA secondary structures to guide sequence motif finding towards single-stranded regions , 2006, Nucleic acids research.

[36]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[37]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..