Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs

Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[2]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[3]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[4]  Brendan J. Frey,et al.  Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data , 2012, BMC Bioinformatics.

[5]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[6]  K. Hansen,et al.  Sequencing technology does not eliminate biological variability , 2011, Nature Biotechnology.

[7]  D. Hanck,et al.  Profiling the array of Cav3.1 variants from the human T‐type calcium channel gene CACNA1G: Alternative structures, developmental expression, and biophysical variations , 2006, Proteins.

[8]  Xiaobo Zhou,et al.  NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq , 2011, BMC Bioinformatics.

[9]  Wing Hung Wong,et al.  Identifiability of isoform deconvolution from junction arrays and RNA-Seq , 2009, Bioinform..

[10]  Tin Wee Tan,et al.  ASGS: an alternative splicing graph web service , 2006, Nucleic Acids Res..

[11]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[12]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[13]  Juan P Fededa,et al.  A polar mechanism coordinates different regions of alternative splicing within a single gene. , 2005, Molecular cell.

[14]  Walid S. Saba,et al.  ANALYSIS AND DESIGN , 2000 .

[15]  J. C. Clemens,et al.  Drosophila Dscam Is an Axon Guidance Receptor Exhibiting Extraordinary Molecular Diversity , 2000, Cell.

[16]  Gunnar Rätsch,et al.  Transcript quantification with RNA-Seq data , 2009, BMC Bioinformatics.

[17]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[18]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[19]  Yufeng Liu,et al.  FDM: a graph-based statistical method to detect differential transcription using RNA-seq data , 2011, Bioinform..

[20]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[21]  C. Mungall,et al.  The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin , 2007, Science.

[22]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[23]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[25]  Marcel H. Schulz,et al.  Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments , 2010, Nucleic acids research.

[26]  N. Proudfoot,et al.  Exon tethering in transcription by RNA polymerase II. , 2006, Molecular cell.

[27]  Mark Daly,et al.  Stochastic yet biased expression of multiple Dscam splice variants by individual cells , 2004, Nature Genetics.

[28]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[29]  Ion I. Mandoiu,et al.  Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[30]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[31]  W. Wong,et al.  Modeling non-uniformity in short-read rates in RNA-Seq data , 2010, Genome Biology.

[32]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[33]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[34]  F. Clark,et al.  Understanding alternative splicing: towards a cellular code , 2005, Nature Reviews Molecular Cell Biology.

[35]  Anne Bergeron,et al.  Exact Transcriptome Reconstruction from Short Sequence Reads , 2008, WABI.

[36]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[37]  James B. Brown,et al.  Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation , 2011, Proceedings of the National Academy of Sciences.

[38]  Yi Xing,et al.  An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs , 2006, Nucleic acids research.

[39]  Fang-Rong Hsu,et al.  The application of alternative splicing graphs in quantitative analysis of alternative splicing form from EST database , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[40]  Haixu Tang,et al.  Splicing graphs and EST assembly problem , 2002, ISMB.

[41]  A. Ben-Hur,et al.  METHOD Open Access , 2014 .

[42]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[43]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[44]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[45]  JiangHui,et al.  Identifiability of isoform deconvolution from junction arrays and RNA-Seq , 2009 .

[46]  Li Yang,et al.  The transcriptional diversity of 25 Drosophila cell lines. , 2011, Genome research.

[47]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[48]  Paul Jenkins,et al.  How Many Transcripts Does It Take to Reconstruct the Splice Graph? , 2006, WABI.

[49]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..