Prediction of mRNA subcellular localization using deep recurrent neural networks

Abstract Motivation Messenger RNA subcellular localization mechanisms play a crucial role in post-transcriptional gene regulation. This trafficking is mediated by trans-acting RNA-binding proteins interacting with cis-regulatory elements called zipcodes. While new sequencing-based technologies allow the high-throughput identification of RNAs localized to specific subcellular compartments, the precise mechanisms at play, and their dependency on specific sequence elements, remain poorly understood. Results We introduce RNATracker, a novel deep neural network built to predict, from their sequence alone, the distributions of mRNA transcripts over a predefined set of subcellular compartments. RNATracker integrates several state-of-the-art deep learning techniques (e.g. CNN, LSTM and attention layers) and can make use of both sequence and secondary structure information. We report on a variety of evaluations showing RNATracker’s strong predictive power, which is significantly superior to a variety of baseline predictors. Despite its complexity, several aspects of the model can be isolated to yield valuable, testable mechanistic hypotheses, and to locate candidate zipcode sequences within transcripts. Availability and implementation Code and data can be accessed at https://www.github.com/HarveyYan/RNATracker. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Peter F. Stadler,et al.  Local RNA base pairing probabilities in large sequences , 2006, Bioinform..

[6]  I. Hofacker,et al.  Predicting RNA 3D structure using a coarse-grain helix-centered model , 2015, RNA.

[7]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[8]  Eric Lécuyer,et al.  The functions and regulatory principles of mRNA intracellular trafficking. , 2014, Advances in experimental medicine and biology.

[9]  Mathieu Blanchette,et al.  CeFra-seq reveals broad asymmetric mRNA and noncoding RNA distribution profiles in Drosophila and human cells , 2018, RNA.

[10]  Marinka Zitnik,et al.  Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins , 2016, Bioinform..

[11]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[12]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[13]  David Haussler,et al.  The UCSC Genome Browser database: 2019 update , 2018, Nucleic Acids Res..

[14]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[15]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[16]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[17]  Daniel Quang,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015 .

[18]  Eric Lécuyer,et al.  RNA localization: Making its way to the center stage. , 2017, Biochimica et biophysica acta. General subjects.

[19]  R. Jansen,et al.  Take the (RN)A-train: localization of mRNA to the endoplasmic reticulum. , 2013, Biochimica et biophysica acta.

[20]  S. Gerstberger,et al.  A census of human RNA-binding proteins , 2014, Nature Reviews Genetics.

[21]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[22]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[23]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[24]  Yuexin Wu,et al.  A deep boosting based approach for capturing the sequence binding preferences of RNA-binding proteins from high-throughput CLIP-seq data , 2016, bioRxiv.

[25]  Igor Ulitsky,et al.  Predictive models of subcellular localization of long RNAs , 2019, RNA.

[26]  Lili Wan,et al.  RNA and Disease , 2009, Cell.

[27]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[28]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[29]  Yu Liu,et al.  MotifMap‐RNA: a genome‐wide map of RBP binding sites , 2017, Bioinform..

[30]  Liangjiang Wang,et al.  Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features , 2018, Scientific Reports.

[31]  Christopher B. Burge,et al.  Sequence, Structure and Context Preferences of Human RNA Binding Proteins , 2017, bioRxiv.

[32]  Kaitlin U Laverty,et al.  RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection. , 2017, Methods.

[33]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[34]  Eric Lécuyer,et al.  CeFra-seq: Systematic mapping of RNA subcellular distribution properties through cell fractionation coupled to deep-sequencing. , 2017, Methods.

[35]  Alessio Colantoni,et al.  Revealing protein–lncRNA interaction , 2015, Briefings Bioinform..

[36]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[37]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[38]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[39]  John L. Rinn,et al.  Live-cell mapping of organelle-associated RNAs via proximity biotinylation combined with protein-RNA crosslinking , 2017, bioRxiv.

[40]  C. Bramham,et al.  Dendritic mRNA: transport, translation and function , 2007, Nature Reviews Neuroscience.