KLEAT: Cleavage Site Analysis of Transcriptomes

In eukaryotic cells, alternative cleavage of 3' untranslated regions (UTRs) can affect transcript stability, transport and translation. For polyadenylated (poly(A)) transcripts, cleavage sites can be characterized with short-read sequencing using specialized library construction methods. However, for large-scale cohort studies as well as for clinical sequencing applications, it is desirable to characterize such events using RNA-seq data, as the latter are already widely applied to identify other relevant information, such as mutations, alternative splicing and chimeric transcripts. Here we describe KLEAT, an analysis tool that uses de novo assembly of RNA-seq data to characterize cleavage sites on 3' UTRs. We demonstrate the performance of KLEAT on three cell line RNA-seq libraries constructed and sequenced by the ENCODE project, and assembled using Trans-ABySS. Validating the KLEAT predictions with matched ENCODE RNA-seq and RNA-PET libraries, we show that the tool has over 90% positive predictive value when there are at least three RNA-seq reads supporting a poly(A) tail and requiring at least three RNA-PET reads mapping within 100 nucleotides as validation. We also compare the performance of KLEAT with other popular RNA-seq analysis pipelines that reconstruct 3' UTR ends, and show that it performs favourably, based on an ROC-like curve.

[1]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[2]  B. Tian,et al.  Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development , 2009, Proceedings of the National Academy of Sciences.

[3]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[4]  Y. Ruan,et al.  Genome wide full-length transcript analysis using 5' and 3' paired-end-tag next generation sequencing (RNA-PET). , 2012, Methods in molecular biology.

[5]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[6]  Atif Shahab,et al.  Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). , 2007, Genome research.

[7]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[8]  H. Miyajima,et al.  A Distinct Expression Pattern of the Long 3′-Untranslated Region Dicer mRNA and Its Implications for Posttranscriptional Regulation in Colorectal Cancer , 2012, Clinical and Translational Gastroenterology.

[9]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[10]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[11]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[12]  J. Keene RNA regulons: coordination of post-transcriptional events , 2007, Nature Reviews Genetics.

[13]  V. Kruys,et al.  AU-rich element-mediated translational control: complexity and multiple activities of trans-activating factors. , 2001, Biochemical Society transactions.

[14]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[15]  Sayan Mukherjee,et al.  Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation , 2013, Bioinform..

[16]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[17]  C. Eckmann,et al.  Increased sensitivity and accuracy of a single-stranded DNA splint-mediated ligation assay (sPAT) reveals poly(A) tail length dynamics of developmentally regulated mRNAs , 2014, RNA biology.

[18]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[19]  J. Graber,et al.  Global changes in processing of mRNA 3' untranslated regions characterize clinically distinct cancer subtypes. , 2009, Cancer research.

[20]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[21]  T. Litman,et al.  Regulation of ABCG2 Expression at the 3′ Untranslated Region of Its mRNA through Modulation of Transcript Stability and Protein Translation by a Putative MicroRNA in the S1 Colon Cancer Cell Line , 2008, Molecular and Cellular Biology.

[22]  Peter J. Shepard,et al.  Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. , 2011, RNA.

[23]  B. Tian,et al.  Positive and negative feedback loops in the p53 and mRNA 3′ processing pathways , 2013, Proceedings of the National Academy of Sciences.

[24]  Hongzhe Li,et al.  A change-point model for identifying 3′UTR switching by next-generation RNA sequencing , 2014, Bioinform..

[25]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[26]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[27]  E. Liu,et al.  Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation , 2005, Nature Methods.

[28]  Chong-Jian Chen,et al.  Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing. , 2011, Genome research.

[29]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[30]  Sören Müller,et al.  APADB: a database for alternative polyadenylation and microRNA regulation events , 2014, Database J. Biol. Databases Curation.