Prediction of Poly(A) Sites by Poly(A) Read Mapping

RNA-seq reads containing part of the poly(A) tail of transcripts (denoted as poly(A) reads) provide the most direct evidence for the position of poly(A) sites in the genome. However, due to reduced coverage of poly(A) tails by reads, poly(A) reads are not routinely identified during RNA-seq mapping. Nevertheless, recent studies for several herpesviruses successfully employed mapping of poly(A) reads to identify herpesvirus poly(A) sites using different strategies and customized programs. To more easily allow such analyses without requiring additional programs, we integrated poly(A) read mapping and prediction of poly(A) sites into our RNA-seq mapping program ContextMap 2. The implemented approach essentially generalizes previously used poly(A) read mapping approaches and combines them with the context-based approach of ContextMap 2 to take into account information provided by other reads aligned to the same location. Poly(A) read mapping using ContextMap 2 was evaluated on real-life data from the ENCODE project and compared against a competing approach based on transcriptome assembly (KLEAT). This showed high positive predictive value for our approach, evidenced also by the presence of poly(A) signals, and considerably lower runtime than KLEAT. Although sensitivity is low for both methods, we show that this is in part due to a high extent of spurious results in the gold standard set derived from RNA-PET data. Sensitivity improves for poly(A) sites of known transcripts or determined with a more specific poly(A) sequencing protocol and increases with read coverage on transcript ends. Finally, we illustrate the usefulness of the approach in a high read coverage scenario by a re-analysis of published data for herpes simplex virus 1. Thus, with current trends towards increasing sequencing depth and read length, poly(A) read mapping will prove to be increasingly useful and can now be performed automatically during RNA-seq mapping with ContextMap 2.

[1]  T. Babak,et al.  A quantitative atlas of polyadenylation in five mammals , 2012, Genome research.

[2]  E. Liu,et al.  Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation , 2005, Nature Methods.

[3]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[4]  G. Yehia,et al.  Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing , 2012, Nature Methods.

[5]  P. Kapranov,et al.  Comprehensive Polyadenylation Site Maps in Yeast and Human Reveal Pervasive Alternative Polyadenylation , 2010, Cell.

[6]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[7]  Chong-Jian Chen,et al.  Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing. , 2011, Genome research.

[8]  M. Hengartner,et al.  Analysis of C. elegans intestinal gene expression and polyadenylation by fluorescence-activated nuclei sorting and 3′-end-seq , 2012, Nucleic acids research.

[9]  Thomas Bonfert,et al.  ContextMap 2: fast and accurate context-based RNA-seq mapping , 2015, BMC Bioinformatics.

[10]  Wei Li,et al.  Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across seven tumour types , 2014, Nature Communications.

[11]  Peter J. Shepard,et al.  Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. , 2011, RNA.

[12]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[13]  Marco Y. Hein,et al.  Decoding Human Cytomegalovirus , 2012, Science.

[14]  C R Cantor,et al.  In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Inanç Birol,et al.  KLEAT: Cleavage Site Analysis of Transcriptomes , 2014, Pacific Symposium on Biocomputing.

[16]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[17]  Caroline C. Friedel,et al.  A Comprehensive Evaluation of Alignment Algorithms in the Context of RNA-Seq , 2012, PloS one.

[18]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[19]  N. Proudfoot Ending the message: poly(A) signals then and now. , 2011, Genes & development.

[20]  M. Levine,et al.  Neural-specific elongation of 3′ UTRs during Drosophila development , 2011, Proceedings of the National Academy of Sciences.

[21]  Nneka Emenyonu,et al.  Rethinking the “Pre” in Pre-Therapy Counseling: No Benefit of Additional Visits Prior to Therapy on Adherence or Viremia in Ugandans Initiating ARVs , 2012, PloS one.

[22]  G. Pesole,et al.  mRNA Untranslated Regions (UTRs) , 2011 .

[23]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[24]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[25]  Vicent Pelechano,et al.  An efficient method for genome-wide polyadenylation site mapping and RNA quantification , 2013, Nucleic acids research.

[26]  R. Elkon,et al.  Alternative cleavage and polyadenylation: extent, regulation and function , 2013, Nature Reviews Genetics.

[27]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[28]  James B. Brown,et al.  Global patterns of tissue-specific alternative polyadenylation in Drosophila. , 2012, Cell reports.

[29]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[30]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[31]  D. Bartel,et al.  Extensive alternative polyadenylation during zebrafish development , 2012, Genome research.

[32]  E Pauws,et al.  Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis. , 2001, Nucleic acids research.

[33]  Karen E. Johnson,et al.  Herpesviral latency-associated transcript gene promotes assembly of heterochromatin on viral lytic-gene promoters in latent infection. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  B. Tian,et al.  Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development (Proceedings of the National Academy of Sciences of the United States of America (2009) 106, 17, (7028-7033) DOI 10.1073/pnas.0900028106) , 2009 .

[35]  Yongsheng Shi,et al.  Alternative polyadenylation: new insights from global analyses. , 2012, RNA.

[36]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[37]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[38]  B. Tian,et al.  Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development , 2009, Proceedings of the National Academy of Sciences.

[39]  K. Nishida,et al.  Mechanisms and consequences of alternative polyadenylation. , 2011, Molecules and Cells.

[40]  I. Lehman,et al.  Herpes simplex virus DNA replication. , 1997, Annual review of biochemistry.

[41]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[42]  C. Mayr,et al.  Widespread Shortening of 3′UTRs by Alternative Cleavage and Polyadenylation Activates Oncogenes in Cancer Cells , 2009, Cell.

[43]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[44]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[45]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[46]  J. Weissman,et al.  KSHV 2.0: A Comprehensive Annotation of the Kaposi's Sarcoma-Associated Herpesvirus Genome Using Next-Generation Sequencing Reveals Novel Genomic and Functional Features , 2014, PLoS pathogens.

[47]  R. Knight,et al.  Regions and Fewer MicroRNA Target Sites Proliferating Cells Express mRNAs with Shortened 3 ' Untranslated , 2012 .

[48]  Haibo Zhang,et al.  Biased alternative polyadenylation in human tissues , 2005, Genome Biology.

[49]  Thomas Bonfert,et al.  Widespread disruption of host transcription termination in HSV-1 infection , 2015, Nature Communications.

[50]  T. Reese,et al.  Pervasive Transcription of a Herpesvirus Genome Generates Functionally Important RNAs , 2014, mBio.