scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data

Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3' tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3' tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.

[1]  Christopher Y. Park,et al.  cTag-PAPERCLIP Reveals Alternative Polyadenylation Promotes Cell-Type Specific Protein Diversity and Shifts Araf Isoforms with Microglia Activation , 2017, Neuron.

[2]  Yi Zhang,et al.  Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. , 2017, Cell reports.

[3]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[4]  Guoli Ji,et al.  A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data , 2020, Briefings Bioinform..

[5]  Guoli Ji,et al.  Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana , 2015, BMC Genomics.

[6]  Wei Chen,et al.  Alternative Polyadenylation: Methods, Findings, and Impacts , 2017, Genom. Proteom. Bioinform..

[7]  Guoli Ji,et al.  Genome level analysis of rice mRNA 3′-end processing signals and alternative polyadenylation , 2008, Nucleic acids research.

[8]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[9]  Sheng Zhu,et al.  Differential alternative polyadenylation contributes to the developmental divergence between two rice subspecies, japonica and indica , 2019, The Plant journal : for cell and molecular biology.

[10]  Ralf Schmidt,et al.  A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation , 2015, bioRxiv.

[11]  P. Benfey,et al.  A Gene Regulatory Network for Root Epidermis Cell Differentiation in Arabidopsis , 2012, PLoS genetics.

[12]  Guoli Ji,et al.  Bioinformatics Analysis of Alternative Polyadenylation in Green Alga Chlamydomonas reinhardtii Using Transcriptome Sequences from Three Different Sequencing Platforms , 2014, G3: Genes, Genomes, Genetics.

[13]  T. Kuromori,et al.  The glycerophosphoryl diester phosphodiesterase-like proteins SHV3 and its homologs play important roles in cell wall organization. , 2008, Plant & cell physiology.

[14]  M. Zavolan,et al.  PolyASite 2.0: a consolidated atlas of polyadenylation sites from 3′ end sequencing , 2019, Nucleic Acids Res..

[15]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[16]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[17]  T. Hashimshony,et al.  CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. , 2012, Cell reports.

[18]  S. Fields,et al.  Dynamics of Gene Expression in Single Root Cells of Arabidopsis thaliana. , 2019, The Plant cell.

[19]  M. Zavolan,et al.  Alternative cleavage and polyadenylation in health and disease , 2019, Nature Reviews Genetics.

[20]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[21]  G. Gibson Going to the negative: genomics for optimized medical prescription , 2018, Nature Reviews Genetics.

[22]  B. Tian,et al.  Alternative polyadenylation of mRNA precursors , 2016, Nature Reviews Molecular Cell Biology.

[23]  Xiaohui Wu,et al.  scDAPA: detection and visualization of dynamic alternative polyadenylation from single cell RNA-seq data , 2019, Bioinform..

[24]  Sheng Zhu,et al.  PlantAPAdb: A Comprehensive Database for Alternative Polyadenylation Sites in Plants1[CC-BY] , 2019, Plant Physiology.

[25]  Melissa D. Lehti-Shiu,et al.  Characteristics and Significance of Intergenic Polyadenylated RNA Transcription in Arabidopsis1[W][OA] , 2012, Plant Physiology.

[26]  Ana Conesa,et al.  Single-cell RNAseq for the study of isoforms—how is that possible? , 2018, Genome Biology.

[27]  Congting Ye,et al.  Role of alternative polyadenylation dynamics in acute myeloid leukaemia at single-cell resolution , 2019, RNA biology.

[28]  Gene W. Yeo,et al.  Single-Cell Alternative Splicing Analysis with Expedition Reveals Splicing Dynamics during Neuron Differentiation. , 2017, Molecular cell.

[29]  Xiaohui Wu,et al.  APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data , 2018, Bioinform..

[30]  Guoli Ji,et al.  Genome-wide dynamics of alternative polyadenylation in rice , 2016, Genome research.

[31]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[32]  C. Rivolta,et al.  Mutations in ARL2BP, a protein required for ciliary microtubule structure, cause syndromic male infertility in humans and mice , 2019, PLoS genetics.

[33]  Bin Tian,et al.  Alternative cleavage and polyadenylation in spermatogenesis connects chromatin regulation with post-transcriptional control , 2016, BMC Biology.

[34]  Chong-Jian Chen,et al.  Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing. , 2011, Genome research.

[35]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[36]  Jia-Wei Wang,et al.  A Single-Cell RNA Sequencing Profiles the Developmental Landscape of Arabidopsis Root. , 2019, Molecular plant.

[37]  Denghui Xing,et al.  Alternative polyadenylation and gene expression regulation in plants , 2011, Wiley interdisciplinary reviews. RNA.

[38]  W. Park,et al.  Alternative polyadenylation of single cells delineates cell types and serves as a prognostic marker in early stage breast cancer , 2019, PloS one.

[39]  Yang Zhong,et al.  Global transcriptome analysis profiles metabolic pathways in traditional herb Astragalus membranaceus Bge. var. mongolicus (Bge.) Hsiao , 2015, BMC Genomics.

[40]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[41]  Guoli Ji,et al.  Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation , 2011, Proceedings of the National Academy of Sciences.

[42]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[43]  Lucie N. Hutchins,et al.  Systematic variation in mRNA 3′-processing signals during mouse spermatogenesis , 2006, Nucleic acids research.

[44]  A. Saliba,et al.  Single-cell RNA-seq: advances and future challenges , 2014, Nucleic acids research.

[45]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[46]  Jeffrey T Leek,et al.  Differential expression analysis of RNA-seq data at single-base resolution , 2014, Biostatistics.

[47]  Hiroshi Suzuki,et al.  Structural Insights into Mdn1, an Essential AAA Protein Required for Ribosome Biogenesis , 2018, Cell.

[48]  Yin Hu,et al.  Robust detection of alternative splicing in a population of single cells , 2016, Nucleic acids research.

[49]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[50]  Tao Jiang,et al.  TAPAS: tool for alternative polyadenylation site analysis , 2018, Bioinform..

[51]  Xiaohui Wu,et al.  Genome-wide determination of poly(A) sites in Medicago truncatula: evolutionary conservation of alternative poly(A) site choice , 2014, BMC Genomics.

[52]  Yong Zeng,et al.  Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes , 2015, Briefings Bioinform..

[53]  R. Elkon,et al.  Cell-type-specific analysis of alternative polyadenylation using single-cell transcriptomics data , 2019, Nucleic acids research.

[54]  Bin Tian,et al.  PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes , 2017, Nucleic Acids Res..

[55]  A. Ekici,et al.  Characterization of germ cell differentiation in the male mouse through single-cell RNA sequencing , 2018, Scientific Reports.

[56]  M. Modarressi,et al.  Expression of splice variants of cancer-testis genes ODF3 and ODF4 in the testis of a prostate cancer patient. , 2012, Genetics and molecular research : GMR.

[57]  Julie L. Yang,et al.  Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression , 2013, Genes & development.

[58]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.

[59]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[60]  Congting Ye,et al.  Discovery of alternative polyadenylation dynamics from single cell types , 2020, Computational and structural biotechnology journal.

[61]  D. Humphreys,et al.  Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data , 2020, Genome Biology.

[62]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[63]  D. Gautheret,et al.  The disparate nature of "intergenic" polyadenylation sites. , 2006, RNA.

[64]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[65]  H. Kang,et al.  Single-Cell RNA Sequencing Resolves Molecular Relationships Among Individual Plant Cells1[OPEN] , 2019, Plant Physiology.

[66]  T. Hashimshony,et al.  Gene expression dynamics are a proxy for selective pressures on alternatively polyadenylated isoforms , 2020, Nucleic acids research.

[67]  Vicent Pelechano,et al.  Single-cell polyadenylation site mapping reveals 3′ isoform choice variability , 2015, Molecular systems biology.

[68]  G. Sanguinetti,et al.  BRIE: transcriptome-wide splicing quantification in single cells , 2017, Genome Biology.