Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data

The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, and limitations and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3′-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for seamless extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies. Furthermore, the containers and reproducible workflows generated in the course of this project can be seamlessly deployed and extended in the future to evaluate new methods or datasets.

[1]  Congting Ye,et al.  A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq , 2022, bioRxiv.

[2]  Matthew R. Gazzara,et al.  RNA splicing analysis using heterogeneous and large RNA-seq datasets , 2021, bioRxiv.

[3]  Y. Gilad,et al.  Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation , 2021, Genome Biology.

[4]  A. Trumpp,et al.  Alternative Polyadenylation in Stem Cell Self-Renewal and Differentiation. , 2021, Trends in molecular medicine.

[5]  Sven Rahmann,et al.  Sustainable data analysis with Snakemake , 2021, F1000Research.

[6]  B. Tabakoff,et al.  Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence , 2021, Nature Communications.

[7]  B. Suess,et al.  SRSF3 and SRSF7 modulate 3′UTR length through suppression or activation of proximal polyadenylation sites and regulation of CFIm levels , 2021, Genome biology.

[8]  Stefan Gerber,et al.  Streamlining differential exon and 3′ UTR usage with diffUTR , 2021, bioRxiv.

[9]  Austin E. Gillen,et al.  LABRAT reveals association of alternative polyadenylation with transcript localization, RNA binding protein expression, transcription speed, and cancer survival , 2020, BMC Genomics.

[10]  Guoli Ji,et al.  A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data , 2020, Briefings Bioinform..

[11]  Ruijia Wang,et al.  APAlyzer: a bioinformatics package for analysis of alternative polyadenylation isoforms , 2020, Bioinform..

[12]  Naima Ahmed Fahmi,et al.  APA-Scan: detection and visualization of 3′-UTR alternative polyadenylation with RNA-seq and 3′-end-seq data , 2020, bioRxiv.

[13]  X. Xiao,et al.  mountainClimber Identifies Alternative Transcription Start and Polyadenylation Sites in RNA-Seq. , 2019, Cell systems.

[14]  M. Zavolan,et al.  PolyASite 2.0: a consolidated atlas of polyadenylation sites from 3′ end sequencing , 2019, Nucleic Acids Res..

[15]  F. Criscione,et al.  Embryo polarity in moth flies and mosquitoes relies on distinct old genes with localized transcript isoforms , 2019, eLife.

[16]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[17]  M. Zavolan,et al.  Alternative cleavage and polyadenylation in health and disease , 2019, Nature Reviews Genetics.

[18]  Benjamin J. Harrison,et al.  Detection of Differentially Expressed Cleavage Site Intervals Within 3′ Untranslated Regions Using CSI-UTR Reveals Regulated Interaction Motifs , 2019, Front. Genet..

[19]  Xiaohui Wu,et al.  APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data , 2018, Bioinform..

[20]  Q. Morris,et al.  QAPA: a new method for the systematic analysis of alternative polyadenylation from RNA-seq data , 2018, Genome Biology.

[21]  E. van Nimwegen,et al.  Discovery of physiological and cancer-related regulators of 3′ UTR processing with KAPAC , 2018, Genome Biology.

[22]  Tao Jiang,et al.  TAPAS: tool for alternative polyadenylation site analysis , 2018, Bioinform..

[23]  M. Zavolan,et al.  3' End Sequencing Library Preparation with A-seq2 , 2017, Journal of visualized experiments : JoVE.

[24]  Wei Li,et al.  TC3A: The Cancer 3′ UTR Atlas , 2017, Nucleic Acids Res..

[25]  Alfonso Valencia,et al.  Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking , 2017, bioRxiv.

[26]  E. Lai,et al.  Genome-wide profiling of the 3' ends of polyadenylated RNAs. , 2017, Methods.

[27]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[28]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[29]  Elena Grassi,et al.  Roar: detecting alternative polyadenylation with standard mRNA sequencing libraries , 2016, BMC Bioinformatics.

[30]  Bin Tian,et al.  3′READS+, a sensitive and accurate method for 3′ end sequencing of polyadenylated RNA , 2016, RNA.

[31]  B. Tian,et al.  Alternative polyadenylation of mRNA precursors , 2016, Nature Reviews Molecular Cell Biology.

[32]  Xiang Zhou,et al.  Accurate Profiling of Gene Expression and Alternative Polyadenylation with Whole Transcriptome Termini Site Sequencing (WTTS-Seq) , 2016, Genetics.

[33]  Christopher Y. Park,et al.  PAPERCLIP Identifies MicroRNA Targets and a Role of CstF64/64tau in Promoting Non-canonical poly(A) Site Usage. , 2016, Cell reports.

[34]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[35]  MinHyeok Kim,et al.  Global estimation of the 3' untranslated region landscape using RNA sequencing. , 2015, Methods.

[36]  E. Lai,et al.  IsoSCM: improved and alternative 3′ UTR annotation using multiple change-point inference , 2015, RNA.

[37]  Wei Li,et al.  Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across seven tumour types , 2014, Nature Communications.

[38]  Björn Rotter,et al.  Massive analysis of cDNA Ends (MACE) and miRNA expression profiling identifies proatherogenic pathways in chronic kidney disease , 2013, Epigenetics.

[39]  Julie L. Yang,et al.  Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression , 2013, Genes & development.

[40]  R. Elkon,et al.  Alternative cleavage and polyadenylation: extent, regulation and function , 2013, Nature Reviews Genetics.

[41]  G. Yehia,et al.  Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing , 2012, Nature Methods.

[42]  R. Elkon,et al.  Alternative Cleavage and Polyadenylation during Colorectal Cancer Development , 2012, Clinical Cancer Research.

[43]  Mihaela Zavolan,et al.  Genome-wide analysis of pre-mRNA 3' end processing reveals a decisive role of human cleavage factor I in the regulation of 3' UTR length. , 2012, Cell reports.

[44]  T. Babak,et al.  A quantitative atlas of polyadenylation in five mammals , 2012, Genome research.

[45]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[46]  Peter J. Shepard,et al.  Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. , 2011, RNA.

[47]  D. Bartel,et al.  Formation, Regulation and Evolution of Caenorhabditis elegans 3′UTRs , 2010, Nature.

[48]  Eric T. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[49]  B. Tian,et al.  Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development , 2009, Proceedings of the National Academy of Sciences.

[50]  Steven W. Flavell,et al.  Genome-Wide Analysis of MEF2 Transcriptional Program Reveals Synaptic Target Genes and Neuronal Activity-Dependent Polyadenylation Site Selection , 2008, Neuron.

[51]  S. Danckwardt,et al.  TRENDseq-A highly multiplexed high throughput RNA 3' end sequencing for mapping alternative polyadenylation. , 2021, Methods in enzymology.

[52]  Yongsheng Shi,et al.  PAS-seq 2: A fast and sensitive method for global profiling of polyadenylated RNAs. , 2021, Methods in enzymology.

[53]  Ira M. Hall,et al.  BEDTools: a flexible suite of utilities for comparing genomic features , 2010, Bioinform..