Inferring Sequence-Structure Preferences of RNA-Binding Proteins with Convolutional Residual Networks

To infer the sequence and RNA structure specificities of RNA-binding proteins (RBPs) from experiments that enrich for bound sequences, we introduce a convolutional residual network which we call ResidualBind. ResidualBind significantly outperforms previous methods on experimental data from many RBP families. We interrogate ResidualBind to identify what features it has learned from high-affinity sequences with saliency analysis along with 1st-order and 2nd-order in silico mutagenesis. We show that in addition to sequence motifs, ResidualBind learns a model that includes the number of motifs, their spacing, and both positive and negative effects of RNA structure context. Strikingly, ResidualBind learns RNA structure context, including detailed base-pairing relationships, directly from sequence data, which we confirm on synthetic data. ResidualBind is a powerful, flexible, and interpretable model that can uncover cis-recognition preferences across a broad spectrum of RBPs.

[1]  L. Gold,et al.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. , 1990, Science.

[2]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Tzvi Aviv,et al.  Sequence-specific recognition of RNA hairpins by the SAM domain of Vts1p , 2006, Nature Structural &Molecular Biology.

[4]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[5]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[6]  Gabriele Varani,et al.  RNA is rarely at a loss for companions; as soon as RNA , 2008 .

[7]  J. Ule,et al.  Protein–RNA interactions: new genomic technologies and perspectives , 2012, Nature Reviews Genetics.

[8]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[9]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[10]  D. Black,et al.  Molecular basis of RNA recognition by the human alternative splicing factor Fox‐1 , 2006, The EMBO journal.

[11]  Bonnie Berger,et al.  RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data , 2016, Bioinform..

[12]  Gene W. Yeo,et al.  Advances and challenges in the detection of transcriptome‐wide protein–RNA interactions , 2017, Wiley interdisciplinary reviews. RNA.

[13]  Gene W. Yeo,et al.  Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges , 2013, Nature Structural &Molecular Biology.

[14]  J. Ule,et al.  iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution , 2010, Nature Structural &Molecular Biology.

[15]  Timothy R. Hughes,et al.  RNAcompete methodology and application to determine sequence preferences of unconventional RNA-binding proteins. , 2017, Methods.

[16]  Alexander Cloninger,et al.  Provable approximation properties for deep neural networks , 2015, ArXiv.

[17]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[18]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[19]  Tyson A. Clark,et al.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing , 2008, Nature.

[20]  Marvin Wickens,et al.  Probing RNA-protein networks: biochemistry meets genomics. , 2015, Trends in biochemical sciences.

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  P. Sharp,et al.  RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. , 2014, Molecular cell.

[24]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[25]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[26]  Sean R. Eddy,et al.  Representation learning of genomic sequence motifs with convolutional neural networks , 2018, bioRxiv.

[27]  Abdullah Ozer,et al.  Comprehensive Analysis of RNA-Protein Interactions by High Throughput Sequencing-RNA Affinity Profiling , 2014, Nature Methods.

[28]  E. Jankowsky,et al.  Specificity and nonspecificity in RNA–protein interactions , 2015, Nature Reviews Molecular Cell Biology.

[29]  Noboru Murata,et al.  Neural Network with Unbounded Activation Functions is Universal Approximator , 2015, 1505.03654.

[30]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[31]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[32]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael E. Harris,et al.  Hidden specificity in an apparently non-specific RNA-binding protein , 2013, Nature.

[35]  Tzvi Aviv,et al.  The NMR and X-ray structures of the Saccharomyces cerevisiae Vts1 SAM domain define a surface for the recognition of RNA hairpins. , 2006, Journal of molecular biology.

[36]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.

[37]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[38]  J. Keene,et al.  Advancing the functional utility of PAR-CLIP by quantifying background binding to mRNAs and lncRNAs , 2014, Genome Biology.

[39]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[42]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[43]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[44]  Peter F. Stadler,et al.  Local RNA base pairing probabilities in large sequences , 2006, Bioinform..

[45]  Xintao Wei,et al.  Resources for the comprehensive discovery of functional RNA elements , 2015, bioRxiv.

[46]  Lior Pachter,et al.  SHAPE–Seq: High‐Throughput RNA Structure Analysis , 2012, Current protocols in chemical biology.

[47]  Michael Q. Zhang,et al.  Design and bioinformatics analysis of genome-wide CLIP experiments , 2015, Nucleic acids research.

[48]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[49]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[50]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..