DeepPASTA: deep neural network based polyadenylation site analysis

MOTIVATION Alternative polyadenylation (polyA) sites near the 3' end of a pre-mRNA create multiple mRNA transcripts with different 3' untranslated regions (3' UTRs). The sequence elements of a 3' UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3' UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. RESULTS In this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. AVAILABILITY AND IMPLEMENTATION https://github.com/arefeen/DeepPASTA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Christine Mayr,et al.  Evolution and Biological Roles of Alternative 3'UTRs. , 2016, Trends in cell biology.

[2]  B. Tian,et al.  Alternative polyadenylation of mRNA precursors , 2016, Nature Reviews Molecular Cell Biology.

[3]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[4]  Huiqing Liu,et al.  An in-silico method for prediction of polyadenylation signals in human sequences. , 2003, Genome informatics. International Conference on Genome Informatics.

[5]  T. Babak,et al.  A quantitative atlas of polyadenylation in five mammals , 2012, Genome research.

[6]  K. Nishida,et al.  Mechanisms and consequences of alternative polyadenylation. , 2011, Molecules and Cells.

[7]  Jianyang Zeng,et al.  A deep learning framework for modeling structural features of RNA-binding protein targets , 2015, Nucleic acids research.

[8]  Yi Li,et al.  Poly(A) code analyses reveal key determinants for tissue-specific mRNA alternative polyadenylation , 2016, RNA.

[9]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[10]  A. Bar‐Shira,et al.  An RNA secondary structure juxtaposes two remote genetic signals for human T-cell leukemia virus type I RNA 3'-end processing , 1991, Journal of virology.

[11]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[12]  Syed Abbas Bukhari,et al.  POLYAR, a new computer program for prediction of poly(A) sites in human sequences , 2010, BMC Genomics.

[13]  Zhi Wei,et al.  DeepPolyA: A Convolutional Neural Network Approach for Polyadenylation Site Prediction , 2018, IEEE Access.

[14]  Jack E. Tabaska,et al.  Detection of polyadenylation signals in human DNA sequences. , 1999, Gene.

[15]  Huiqing Liu,et al.  DNAFSMiner: a web-based software toolbox to recognize two types of functional sites in DNA sequences , 2005, Bioinform..

[16]  J. Somers,et al.  RNA Binding Protein/RNA Element Interactions and the Control of Translation , 2012, Current protein & peptide science.

[17]  Rolf Backofen,et al.  Global or local? Predicting secondary structure and accessibility in mRNAs , 2012, Nucleic acids research.

[18]  G. Shaw,et al.  A conserved AU sequence from the 3′ untranslated region of GM-CSF mRNA mediates selective mRNA degradation , 1986, Cell.

[19]  E. Wahle,et al.  The mechanism of 3' cleavage and polyadenylation of eukaryotic pre-mRNA. , 1997, Progress in nucleic acid research and molecular biology.

[20]  Tao Jiang,et al.  TITER: predicting translation initiation sites by deep learning , 2017, bioRxiv.

[21]  J. Manley,et al.  Mechanism and regulation of mRNA polyadenylation. , 1997, Genes & development.

[22]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[23]  Chunxiao Wu,et al.  Secondary Structure as a Functional Feature in the Downstream Region of Mammalian Polyadenylation Signals , 2004, Molecular and Cellular Biology.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[26]  M. Hentze,et al.  3′ end mRNA processing: molecular mechanisms and implications for health and disease , 2008, The EMBO journal.

[27]  Sayan Mukherjee,et al.  Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation , 2013, Bioinform..

[28]  Robert M. Miura,et al.  Prediction of mRNA polyadenylation sites by support vector machine , 2006, Bioinform..

[29]  E. Wahle,et al.  3'-end cleavage and polyadenylation of mRNA precursors. , 1995, Biochimica et biophysica acta.

[30]  Haibo Zhang,et al.  Biased alternative polyadenylation in human tissues , 2005, Genome Biology.

[31]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[32]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[33]  Vladimir B. Bajic,et al.  Bioinformatics Applications Note Sequence Analysis Dragon Polya Spotter: Predictor of Poly(a) Motifs within Human Genomic Dna Sequences , 2022 .

[34]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[35]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[36]  B. Tian,et al.  Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation. , 2005, RNA.

[37]  Sue Fletcher,et al.  Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements , 2012, Cellular and Molecular Life Sciences.

[38]  Victor V. Solovyev,et al.  Recognition of 3'-processing sites of human mRNA precursors , 1997, Comput. Appl. Biosci..

[39]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[40]  Yu Li,et al.  DeeReCT-PolyA: a robust and generic deep learning method for PAS identification , 2018, Bioinform..

[41]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[42]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[43]  Patrice M. Milos,et al.  An in-depth map of polyadenylation sites in cancer , 2012, Nucleic acids research.

[44]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[45]  B. Cullen,et al.  Effect of RNA secondary structure on polyadenylation site selection. , 1991, Genes & development.

[46]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[47]  Vladimir B. Bajic,et al.  Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences , 2011, Bioinform..

[48]  Shannon L. Risacher,et al.  Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning , 2012, Bioinform..