APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data

Motivation Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3' ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites. Results We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3' UTRs and 3' UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome. Availability and implementation Freely available for download at https://apatrap.sourceforge.io. Contact liqq@xmu.edu.cn or xhuister@xmu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  A. Hameed,et al.  Drought induced programmed cell death and associated changes in antioxidants, proteases, and lipid peroxidation in wheat leaves , 2013, Biologia Plantarum.

[2]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[3]  Yong Zeng,et al.  Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes , 2015, Briefings Bioinform..

[4]  V. Kim,et al.  TAIL-seq: genome-wide determination of poly(A) tail length and 3' end modifications. , 2014, Molecular cell.

[5]  M. Levine,et al.  ELAV links paused Pol II to alternative polyadenylation in the Drosophila nervous system. , 2015, Molecular cell.

[6]  Xiaohui Wu,et al.  Genome-wide determination of poly(A) sites in Medicago truncatula: evolutionary conservation of alternative poly(A) site choice , 2014, BMC Genomics.

[7]  Liwei Sun,et al.  Proteomic Analyses Provide Novel Insights into Plant Growth and Ginsenoside Biosynthesis in Forest Cultivated Panax ginseng (F. Ginseng) , 2016, Front. Plant Sci..

[8]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[9]  Xiaohui Wu,et al.  PlantAPA: A Portal for Visualization and Analysis of Alternative Polyadenylation in Plants , 2016, Front. Plant Sci..

[10]  Guoli Ji,et al.  Genome-wide dynamics of alternative polyadenylation in rice , 2016, Genome research.

[11]  Wei Li,et al.  Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across seven tumour types , 2014, Nature Communications.

[12]  B. Tian,et al.  Alternative cleavage and polyadenylation: the long and short of it. , 2013, Trends in biochemical sciences.

[13]  E. van Nimwegen,et al.  Global 3′ UTR shortening has a limited effect on protein abundance in proliferating T cells , 2014, Nature Communications.

[14]  Hyeshik Chang,et al.  Regulation of Poly(A) Tail and Translation during the Somatic Cell Cycle. , 2016, Molecular cell.

[15]  Guoli Ji,et al.  Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation , 2011, Proceedings of the National Academy of Sciences.

[16]  Vicent Pelechano,et al.  An efficient method for genome-wide polyadenylation site mapping and RNA quantification , 2013, Nucleic acids research.

[17]  D. Bartel,et al.  Extensive alternative polyadenylation during zebrafish development , 2012, Genome research.

[18]  Guoli Ji,et al.  Genome level analysis of rice mRNA 3′-end processing signals and alternative polyadenylation , 2008, Nucleic acids research.

[19]  Thomas Bonfert,et al.  Prediction of Poly(A) Sites by Poly(A) Read Mapping , 2017, PloS one.

[20]  Julie L. Yang,et al.  Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression , 2013, Genes & development.

[21]  D. Inzé,et al.  Leaf Responses to Mild Drought Stress in Natural Variants of Arabidopsis1[OPEN] , 2015, Plant Physiology.

[22]  Yonggui Fu,et al.  Evaluation of Two Statistical Methods Provides Insights into the Complex Patterns of Alternative Polyadenylation Site Switching , 2015, PloS one.

[23]  K. Nishida,et al.  Mechanisms and consequences of alternative polyadenylation. , 2011, Molecules and Cells.

[24]  D. Bartel,et al.  Widespread Influence of 3′-End Structures on Mammalian mRNA Processing and Stability , 2017, Cell.

[25]  B. San Segundo,et al.  Overexpression of a Calcium-Dependent Protein Kinase Confers Salt and Drought Tolerance in Rice by Preventing Membrane Lipid Peroxidation1[C][W] , 2014, Plant Physiology.

[26]  T. Babak,et al.  A quantitative atlas of polyadenylation in five mammals , 2012, Genome research.

[27]  G. Yehia,et al.  Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing , 2012, Nature Methods.

[28]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[29]  V. Kim,et al.  mTAIL-seq reveals dynamic poly(A) tail regulation in oocyte-to-embryo development , 2016, Genes & development.

[30]  Jie Li,et al.  APASdb: a database describing alternative poly(A) sites and selection of heterogeneous cleavage sites downstream of poly(A) signals , 2014, Nucleic Acids Res..

[31]  Hongzhe Li,et al.  A change-point model for identifying 3′UTR switching by next-generation RNA sequencing , 2014, Bioinform..

[32]  Stephen J. Benkovic,et al.  Corrigendum: RecG and UvsW catalyse robust DNA rewinding critical for stalled DNA replication fork rescue , 2014, Nature Communications.

[33]  James B. Brown,et al.  Global patterns of tissue-specific alternative polyadenylation in Drosophila. , 2012, Cell reports.

[34]  M. Mangone,et al.  Comparative RNA-Seq analysis reveals pervasive tissue-specific alternative polyadenylation in Caenorhabditis elegans intestine and muscles , 2015, BMC Biology.

[35]  M. Swanson,et al.  Global insights into alternative polyadenylation regulation , 2015, RNA biology.

[36]  Bin Tian,et al.  PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes , 2007, Nucleic Acids Res..

[37]  John K. Kim,et al.  Driving glioblastoma growth by alternative polyadenylation , 2014, Cell Research.

[38]  Rick L. Stevens,et al.  High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource , 2014, Proceedings of the National Academy of Sciences.

[39]  Chong-Jian Chen,et al.  Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing. , 2011, Genome research.

[40]  B. Tian,et al.  Alternative polyadenylation of mRNA precursors , 2016, Nature Reviews Molecular Cell Biology.

[41]  L. Steinmetz,et al.  Alternative polyadenylation diversifies post‐transcriptional regulation by selective RNA–protein interactions , 2014, Molecular Systems Biology.

[42]  E. Lai,et al.  IsoSCM: improved and alternative 3′ UTR annotation using multiple change-point inference , 2015, RNA.

[43]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[44]  Torsten Seemann,et al.  PAT-seq: a method to study the integration of 3′-UTR dynamics with gene expression in the eukaryotic transcriptome , 2015, RNA.

[45]  Anna Tramontano,et al.  3USS: a web server for detecting alternative 3′UTRs from RNA-seq experiments , 2015, Bioinform..

[46]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[47]  Inanç Birol,et al.  KLEAT: Cleavage Site Analysis of Transcriptomes , 2014, Pacific Symposium on Biocomputing.

[48]  Christine Mayr,et al.  Alternative 3'UTRs act as scaffolds to regulate membrane protein localization , 2015, Nature.

[49]  Ralf Schmidt,et al.  A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation , 2015, bioRxiv.

[50]  E. Lai,et al.  Widespread and extensive lengthening of 3′ UTRs in the mammalian brain , 2013, Genome research.

[51]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[52]  Christine Mayr,et al.  Evolution and Biological Roles of Alternative 3'UTRs. , 2016, Trends in cell biology.