DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning

Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic works have mainly focused on the recognition of polyadenylation sites (PAS) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PAS in a same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PAS into account. To address this, here we propose a deep learning architecture, DeeReCT-APA, to quantitatively predict the usage of all alternative PAS of a given gene. To accommodate different genes with potentially different numbers of PAS, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a CNN-LSTM architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PAS, and outputs percentage scores representing the usage levels of all PAS of a gene. In addition to the fact that only our method can predict quantitatively the usage of all the PAS within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and shed light on future mechanistic understanding in APA regulation. Our code and data are available at https://github.com/lzx325/DeeReCT-APA-repo.

[1]  Yizhou Yu,et al.  WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets , 2018, Quantitative Biology.

[2]  Georg Seelig,et al.  A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation , 2019, Cell.

[3]  S. Orkin,et al.  Thalassemia due to a mutation in the cleavage‐polyadenylation signal of the human beta‐globin gene. , 1985, The EMBO journal.

[4]  Lihua Li,et al.  DEEPre: sequence-based enzyme EC number prediction by deep learning , 2017, Bioinform..

[5]  Yongsheng Shi,et al.  Alternative polyadenylation: new insights from global analyses. , 2012, RNA.

[6]  Yingnian Wu,et al.  Deep-learning augmented RNA-seq analysis of transcript splicing , 2019, Nature Methods.

[7]  Christine Mayr,et al.  Alternative 3'UTRs act as scaffolds to regulate membrane protein localization , 2015, Nature.

[8]  Yu Li,et al.  DeeReCT-PolyA: a robust and generic deep learning method for PAS identification , 2018, Bioinform..

[9]  Yu Li,et al.  Promoter analysis and prediction in the human genome using sequence-based deep learning models , 2019, Bioinform..

[10]  R. Elkon,et al.  Alternative cleavage and polyadenylation: extent, regulation and function , 2013, Nature Reviews Genetics.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Brita Fritsch,et al.  Distinct 3′UTRs differentially regulate activity-dependent translation of brain-derived neurotrophic factor (BDNF) , 2010, Proceedings of the National Academy of Sciences.

[14]  C. Y. Chen,et al.  AU-rich elements: characterization and importance in mRNA degradation. , 1995, Trends in biochemical sciences.

[15]  D. Niessing,et al.  Of social molecules: The interactive assembly of ASH1 mRNA-transport complexes in yeast , 2014, RNA biology.

[16]  G. Yehia,et al.  Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing , 2012, Nature Methods.

[17]  Brendan J. Frey,et al.  COSSMO: predicting competitive alternative splice site selection using deep learning , 2018, bioRxiv.

[18]  R. Desnick,et al.  Fabry disease: novel alpha-galactosidase A 3'-terminal mutations result in multiple transcripts due to aberrant 3'-end formation. , 2003, American journal of human genetics.

[19]  Tao Jiang,et al.  DeepPASTA: deep neural network based polyadenylation site analysis , 2019, Bioinform..

[20]  Vladimir B. Bajic,et al.  Bioinformatics Applications Note Sequence Analysis Dragon Polya Spotter: Predictor of Poly(a) Motifs within Human Genomic Dna Sequences , 2022 .

[21]  Yu Li,et al.  mlDEEPre: Multi-Functional Enzyme Function Prediction With Hierarchical Multi-Label Deep Learning , 2019, Front. Genet..

[22]  M. Zavolan,et al.  Alternative cleavage and polyadenylation in health and disease , 2019, Nature Reviews Genetics.

[23]  R. Singer,et al.  Localization of ASH1 mRNA particles in living yeast. , 1998, Molecular cell.

[24]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[25]  C. Libert,et al.  Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing , 2015, Molecular systems biology.

[26]  R. Lehmann,et al.  oskar organizes the germ plasm and directs localization of the posterior determinant nanos , 1991, Cell.

[27]  Wei Sun,et al.  Global analysis of regulatory divergence in the evolution of mouse alternative polyadenylation , 2016, Molecular systems biology.

[28]  Jiahuai Han,et al.  Orphan nuclear receptor TR3 acts in autophagic cell death via mitochondrial signaling pathway. , 2014, Nature chemical biology.

[29]  L. Tong,et al.  Protein factors in pre-mRNA 3′-end processing , 2008, Cellular and Molecular Life Sciences.

[30]  B. Tian,et al.  Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation. , 2005, RNA.

[31]  Le Song,et al.  Poly(A) motif prediction using spectral latent features from human DNA sequences , 2013, Bioinform..

[32]  E. Izaurralde,et al.  Towards a molecular understanding of microRNA-mediated gene silencing , 2015, Nature Reviews Genetics.

[33]  L. Paillard,et al.  AU-rich elements and associated factors: are there unifying principles? , 2006, Nucleic acids research.

[34]  Renmin Han,et al.  DeepSimulator: a deep simulator for Nanopore sequencing , 2017, bioRxiv.

[35]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[36]  V. Bajic,et al.  Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA , 2017, BMC Genomics.

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[39]  M. Selbach,et al.  Extensive allele-specific translational regulation in hybrid mice , 2015, Molecular systems biology.

[40]  Hans D. Ochs,et al.  A rare polyadenylation signal mutation of the FOXP3 gene (AAUAAA→AAUGAA) leads to the IPEX syndrome , 2001, Immunogenetics.

[41]  S. Goodbourn,et al.  Alpha-thalassaemia caused by a polyadenylation signal mutation. , 1983, Nature.

[42]  C. Sunkel,et al.  RNA polymerase II kinetics in polo polyadenylation signal selection , 2011, The EMBO journal.