A change-point model for identifying 3′UTR switching by next-generation RNA sequencing

MOTIVATION Next-generation RNA sequencing offers an opportunity to investigate transcriptome in an unprecedented scale. Recent studies have revealed widespread alternative polyadenylation (polyA) in eukaryotes, leading to various mRNA isoforms differing in their 3' untranslated regions (3'UTR), through which, the stability, localization and translation of mRNA can be regulated. However, very few, if any, methods and tools are available for directly analyzing this special alternative RNA processing event. Conventional methods rely on annotation of polyA sites; yet, such knowledge remains incomplete, and identification of polyA sites is still challenging. The goal of this article is to develop methods for detecting 3'UTR switching without any prior knowledge of polyA annotations. RESULTS We propose a change-point model based on a likelihood ratio test for detecting 3'UTR switching. We develop a directional testing procedure for identifying dramatic shortening or lengthening events in 3'UTR, while controlling mixed directional false discovery rate at a nominal level. To our knowledge, this is the first approach to analyze 3'UTR switching directly without relying on any polyA annotations. Simulation studies and applications to two real datasets reveal that our proposed method is powerful, accurate and feasible for the analysis of next-generation RNA sequencing data. CONCLUSIONS The proposed method will fill a void among alternative RNA processing analysis tools for transcriptome studies. It can help to obtain additional insights from RNA sequencing data by understanding gene regulation mechanisms through the analysis of 3'UTR switching. AVAILABILITY AND IMPLEMENTATION The software is implemented in Java and can be freely downloaded from http://utr.sourceforge.net/. CONTACT zhiwei@njit.edu or hongzhe@mail.med.upenn.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Keith J. Worsley,et al.  The power of likelihood ratio and cumulative sum tests for a change in a binomial probability , 1983 .

[2]  F. André,et al.  Targeting the deregulated spliceosome core machinery in cancer cells triggers mTOR blockade and autophagy. , 2013, Cancer research.

[3]  Paolo Provero,et al.  Shortening of 3′UTRs Correlates with Poor Prognosis in Breast and Lung Cancer , 2012, PloS one.

[4]  R. Knight,et al.  Regions and Fewer MicroRNA Target Sites Proliferating Cells Express mRNAs with Shortened 3 ' Untranslated , 2012 .

[5]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[6]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[7]  X. Guan,et al.  LDH-A silencing suppresses breast cancer tumorigenicity through induction of oxidative stress mediated mitochondrial pathway apoptosis , 2012, Breast Cancer Research and Treatment.

[8]  X. Liu,et al.  Amplitude Modulation of Androgen Signaling by C-myc Material Supplemental , 2013 .

[9]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[10]  B. Tian,et al.  Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development , 2009, Proceedings of the National Academy of Sciences.

[11]  Larry N. Singh,et al.  U1 snRNP protects pre-mRNAs from premature cleavage and polyadenylation , 2010, Nature.

[12]  A. Ben-Hur,et al.  METHOD Open Access , 2014 .

[13]  Karen Kafadar Special section: Statistical methods for next-generation gene sequencing data , 2012 .

[14]  C. Mayr,et al.  Widespread Shortening of 3′UTRs by Alternative Cleavage and Polyadenylation Activates Oncogenes in Cancer Cells , 2009, Cell.

[15]  Donny D. Licatalosi,et al.  RNA processing and its regulation: global insights into biological networks , 2010, Nature Reviews Genetics.

[16]  Wencheng Li,et al.  Transcriptional activity regulates alternative cleavage and polyadenylation , 2011, Molecular systems biology.

[17]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[18]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[19]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[20]  James B. Brown,et al.  Global patterns of tissue-specific alternative polyadenylation in Drosophila. , 2012, Cell reports.

[21]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[22]  K. Martin,et al.  mRNA Localization: Gene Expression in the Spatial Dimension , 2009, Cell.

[23]  G. Barton,et al.  Direct Sequencing of Arabidopsis thaliana RNA Reveals Patterns of Cleavage and Polyadenylation , 2012, Nature Structural &Molecular Biology.

[24]  Larry N. Singh,et al.  U1 snRNP Determines mRNA Length and Regulates Isoform Expression , 2012, Cell.

[25]  Bin Tian,et al.  PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes , 2007, Nucleic Acids Res..

[26]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[27]  Steven W. Flavell,et al.  Genome-Wide Analysis of MEF2 Transcriptional Program Reveals Synaptic Target Genes and Neuronal Activity-Dependent Polyadenylation Site Selection , 2008, Neuron.

[28]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[29]  Chong-Jian Chen,et al.  Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing. , 2011, Genome research.

[30]  Tim R. Mercer,et al.  Expression of distinct RNAs from 3′ untranslated regions , 2010, Nucleic acids research.

[31]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[32]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[33]  L. Kastl,et al.  Effects of decitabine on the expression of selected endogenous control genes in human breast cancer cells. , 2010, Molecular and cellular probes.

[34]  G. Ast,et al.  Alternative splicing and evolution: diversification, exon definition and function , 2010, Nature Reviews Genetics.

[35]  J. J. Shen,et al.  Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing , 2012, 1206.6627.

[36]  Patrice M. Milos,et al.  An in-depth map of polyadenylation sites in cancer , 2012, Nucleic acids research.

[37]  A. Qattan,et al.  Spatial distribution of cellular function: the partitioning of proteins between mitochondria and the nucleus in MCF7 breast cancer cells. , 2012, Journal of proteome research.

[38]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[39]  Wenge Guo,et al.  Controlling False Discoveries in Multidimensional Directional Decisions, with Applications to Gene Expression Data on Ordered Categories , 2010, Biometrics.

[40]  Viktoriya D. Nikolova,et al.  Differential roles for membrane-bound and soluble syndecan-1 (CD138) in breast cancer progression. , 2009, Carcinogenesis.

[41]  John W. Tukey,et al.  Controlling Error in Multiple Comparisons, with Examples from State-to-State Differences in Educational Achievement , 1999 .

[42]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[43]  Michael Recce,et al.  PolyA_DB: a database for mammalian mRNA polyadenylation , 2004, Nucleic Acids Res..

[44]  M. Moore From Birth to Death: The Complex Lives of Eukaryotic mRNAs , 2005, Science.

[45]  Steven J. M. Jones,et al.  Alternative expression analysis by RNA sequencing , 2010, Nature Methods.

[46]  J. Manley,et al.  Alternative pre-mRNA splicing regulation in cancer: pathways and programs unhinged. , 2010, Genes & development.

[47]  Juw Won Park,et al.  MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data , 2012, Nucleic acids research.

[48]  K. Worsley Confidence regions and tests for a change-point in a sequence of exponential family random variables , 1986 .

[49]  Larry N. Singh,et al.  Dysregulation of synaptogenesis genes antecedes motor neuron pathology in spinal muscular atrophy , 2013, Proceedings of the National Academy of Sciences.

[50]  J. Manley,et al.  Mechanism and regulation of mRNA polyadenylation. , 1997, Genes & development.

[51]  D. Bartel,et al.  Extensive alternative polyadenylation during zebrafish development , 2012, Genome research.

[52]  N. Proudfoot Ending the message: poly(A) signals then and now. , 2011, Genes & development.

[53]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.